Friday, 29th March 2024
Puzzles Solved Yesterday: 131
Forum Index
 
Page 2 of 4<1234>
Chess-like player ratings (Was: Scraping policy)
tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.16 21:06:37
Latest Ratings
Stable ratings
Ratings Movement between Stable and Latest

Ratings movement will become more interesting over time I think, for now its Almost a requote of latest ratings in some ways, because stable ratings are so poorly defined.
Last edited by Tilps - 2007.01.23 03:45:08
Nis
Kwon-Tom Obsessive
Puzzles: 2129
Best Total: 22m 1s
Posted - 2007.01.17 01:11:20
Would you care to elaborate on the formula you use to calculate the rating? I have spent quite a bit of time playing with backgammon rating formulas, and would be interested in what you are doing.
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.17 02:39:36
I use the expected win formula out of the wikipedia article on chess ratings for calculating the probability that player a will rank higher then player b in a given days puzzle. (Two players of the same rating should have equal chances of ranking higher or lower than each other in any given day.)

For each player in a competition, the probability of ranking lower than each player (including themselves) is summed and 0.5 added.  This gives their 'expected rank'.  This is compared to their actual rank and the difference between expected and actual ranks is divided by the total number of contestants to normalize for the fact that some competitions have more people then other.  This normalized number is multiplied by a volatility constant which basicaly just controls how fast the ratings change and then added to the players previous rating.

First time players are given 1500 estimated rating, and non-competitors keep their previous rating.

The probability formula from wikipedia has the advantage of being easy to calculate in c#, as the erf function is not provided which would be needed for normal distribution probability calculations.
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.17 08:01:57
Found a bug - user names are case insensitive, but my program was treating them as case sensitive.  This meant there was two tilps entries in the list, not sure if it affected anyone else.  Will have to merge the results.

Update:
Results have been updated to correct the issue - I think 6 people were affected.  Still doesn't fix the fact that my rating is only 1508
Last edited by Tilps - 2007.01.17 08:13:16
Brian
Kwon-Tom Obsessive
Puzzles: 4739
Best Total: 9m 6s
Posted - 2007.01.17 18:55:30
This is interesting.

I think I'm relatively better on the big weekend puzzles (equivalently, relatively worse on the shorter weekday puzzles), so I'm guessing that I'll rank lower in your system than I would by ranking weekly times. (On the other hand, this effect might simply be due to shorter puzzles having a higher variance in solving time.) But in any case it would be interesting to see how well these two ranking systems correlate.

Also, if you kept doing this for a while, it would be interesting to see ratings for specific days, because I'm sure some people are (relatively) better at solving hard puzzles than easy ones or vice versa.
Last edited by Brian - 2007.01.17 18:56:19
astrokath
Kwon-Tom Obsessive
Puzzles: 3258
Best Total: 13m 42s
Posted - 2007.01.17 19:12:53
Quote:
Originally Posted by brian
This is interesting.

I think I'm relatively better on the big weekend puzzles (equivalently, relatively worse on the shorter weekday puzzles), so I'm guessing that I'll rank lower in your system than I would by ranking weekly times. (On the other hand, this effect might simply be due to shorter puzzles having a higher variance in solving time.) But in any case it would be interesting to see how well these two ranking systems correlate.

Also, if you kept doing this for a while, it would be interesting to see ratings for specific days, because I'm sure some people are (relatively) better at solving hard puzzles than easy ones or vice versa.

Mmm.  I'm not quite as competetive as I used to be for the quickest puzzles, but I seem to be holding my own for the trickier-than-average ones.  Of course, the long, slow puzzles don't get you on the BECT board!
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.17 20:36:08
Quote:
Originally Posted by brian
This is interesting.

...

Also, if you kept doing this for a while, it would be interesting to see ratings for specific days, because I'm sure some people are (relatively) better at solving hard puzzles than easy ones or vice versa.

My thoughts have headed this direction too - but its going to need to be Months before showing such ratings mean anything significant.

Its easy to tack onto my current system, but I think I will wait a month before showing them for the first time.

Edit: Files have been updated again to include the last 12 hours of data for latest/movement.
Last edited by Tilps - 2007.01.17 21:19:48
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.18 00:03:32
Quote:
Originally Posted by brian
This is interesting.

I think I'm relatively better on the big weekend puzzles (equivalently, relatively worse on the shorter weekday puzzles), so I'm guessing that I'll rank lower in your system than I would by ranking weekly times. (On the other hand, this effect might simply be due to shorter puzzles having a higher variance in solving time.) But in any case it would be interesting to see how well these two ranking systems correlate.

I have done some thinking on this matter too now.  I can implement another set of pages which show ratings changes based on 'weekly times' however I have several different designs which may be worth considering.

1) Ratings update 'daily' like the current ratings.  The weekly total is calculated for each day by going back for the last 7 days.  This means each day is included in 7 different ratings calculations.
  a) A competitor is included if they have done any of the last 7 puzzles for that day, otherwise they are deemed non-competing and keep their previous rating.
  b) A competitor is included if they have done all 7 previous puzzles, otherwise they are deemed non-competing and keep their previous rating.

2) Ratings are updated once a week, based only on days which are closed from further competition. (Only a Stable-Weekly.html generated, no Latest-Weekly.html).  Each day is only included in one rating calculation.  It seems pretty obvious that competitors should only be excluded from rating calculation and change if they don't compete in any of the 7 days.

3) New ratings are generated each day like in 1), but the rating calculations are done like in 2), with the exception that maybe a Latest-Weekly.html can be generated which is useful.

Problems with 2 and 3 is the number of rating changes between the begining of time and now, is much smaller, which means it takes longer for the ratings to have good meaning.  Problem with 1 is that the each day gets used in so many ratings.

I guess I could do both 1 and 3 (given that 2 is just Sundays Stable-Weekly.html from 3) and see how they go.
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.18 08:34:20
Okay, here we go.  More ratings than you can poke a stick at.

Kwon-Tom Ratings

We now have:
Normal, where rating is updated each day based on todays performance.

Weekly, where rating is updated weekly, based on the last weeks performance. (On Tuesdays, Weekly is the week ending tuesday, on wednesdays, Weekly is the week ending wednesday, and so on.)

Rolling, where rating is updated daily, based on the last weeks performance.

Daily, where rating is updated weekly, based on the last days performance.  Since people will always be interested in each of the 7 different daily ratings, all 7 are available, rather than just todays as it is for weekly.

Rolling and weekly stats are very vulnerable to missing out on a puzzle, rolling even more than weekly, I suspect.  Therefore, given the incomplete data I have at the start of my score table, these ratings may show very unexpected results for quite some time.

Daily stats simply need alot more data (as do weekly) before they become really interesting, both need 7 times more data then rolling and normal, and I suspect at least 3 weeks of data is needed for normal before the top rated players ratings stop rising monotonically.

Enjoy!
Last edited by Tilps - 2007.01.23 03:45:34
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.18 08:39:29
Would anyone like 'wrong' ratings? - it would be quite easy to add them to my system if a LastWrongHour.php page was added to the site much like the LastHour page.
Stephen
Kwon-Tom Obsessive
Puzzles: 5000
Best Total: 22m 33s
Posted - 2007.01.18 10:00:30
How about a 'rank' column on the tables ie simply 1-n - if you are outside the top 10 or so, it's depressing to have to count the lines each time to see where you stand!
foilman
Kwon-Tom Admin
Puzzles: 3384
Best Total: 24m 6s
Posted - 2007.01.18 10:01:46
Wrongs Solved In Last Hour
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.18 10:42:08
And to show how fast I can add the wrongs - they are done
(*will clean up the code later *)

Kwon Tom Ratings
Kwon Tom Wrong Ratings

Moved the pages to new urls, since the number of files in my download directory was getting out of hand.

And I added the rank feature too.

Edit: Scores updated again.
Last edited by Tilps - 2007.01.18 21:16:07
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.19 23:10:02
Another Daily update done.

Edit: And another.
Last edited by Tilps - 2007.01.21 00:05:46
Brian
Kwon-Tom Obsessive
Puzzles: 4739
Best Total: 9m 6s
Posted - 2007.01.21 00:39:23
Quote:
Originally Posted by tilps
Another Daily update done.

Edit: And another.

So I take it that you manually update the webpage that lists the rankings, but you have a program that has them at any moment?

It might be interesting to have not just the latest movement, but individual histories too, possibly in graph form. (You've probably already considered this.) Of course this would only be useful after a few weeks when the ratings have levelled off.

Anyway, cool stuff.
m2e
Kwon-Tom Obsessive
Puzzles: 607
Best Total: 16m 43s
Posted - 2007.01.21 04:26:57
Also what happens when a player misses a day?
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.21 06:32:19
Missing a day is quite bad for weekly ratings, very bad for rolling ratings, and not-relevant for the rest.
For the rest of the ratings, if you miss a day, you keep your existing ratings.  But because weekly and rolling are based on 'weekly total time' all people with 6 completed days are ranked lower than 7 completed days, in correspondence with the leader board.

And long term player histories - I'll consider them, but since I'm only using static web pages, that is a very large number of static web pages to be producing at all times.  (And I'm currently spending my spare time on development for my Apple Hunt game, so not so much time to spend learning php in order to generate the player histories on demand.

Edit: Updated ratings again.
Last edited by Tilps - 2007.01.21 22:04:29
fgnn
Kwon-Tom Obsessive
Puzzles: 717
Best Total: 19m 46s
Posted - 2007.01.22 02:42:05
And how does bombing a puzzle affect your overall score. Like a 2 minute puzzle and 4 minute puzzle are both good attempts, but a 2day puzzle was clearly a screwup and is not representative of how good you actually are. Do you account for this at all? (sory if you already said this)
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.22 02:49:06
Absolute score takes no part in the calculation of new ratings, only your ranking within a given puzzle.  Therefore if you do exceptionaly badly on one puzzle, the worst you can do is come last.  Your rating will probably suffer, but its only going to be a short term thing, you will rebuild your rating based on your normal performance.  The greatest number of points your rating can change in a single day is 30.

Ofcourse, someone trying to game the system to boost their rating, would just not submit at all for a day that they think they have done really badily on.  This however is only likely to help your ratings on the normal and individual day ratings - it would be a major detriment to their scores on the weekly and rolling, since doing 7 puzzles is Always considered better than 6, in those cases.
Tilps
Kwon-Tom Obsessive
Puzzles: 6483
Best Total: 20m 22s
Posted - 2007.01.23 03:46:52
Updated again about 5-6 hours ago - just went through and updated all the links in my posts in this thread to point to the location I moved them to last weekend.
Page 2 of 4<1234>

Forum Index