Skewed rating results: 2/23/2011 23:50:04 
Duke
Level 5
Report

I realize that they're preliminary until you've reached 10 and they're still somewhat skewed by insufficient volume of data, but I find it really odd that WL Fanatic has a higher rating than me. I have 6 wins and 1 loss against WL Fanatic. WL Fanatic has 1 win (against me) and 1 loss also against me.
It seems impossible to me that his rating result could be higher. Our wins against each other should cancel out and my other 5 wins should make me worth at least a bit more.
Yet he's on the board with an 1885 and I have an 1881.
I'm pointing this out because it looks like an error, not because I particularly care about a preliminary rating  it'll all eventually work itself out.
Would someone be so kind as to explain how this result is possible.

Skewed rating results: 2/23/2011 23:59:12 
crafty35a
Level 3
Report

Duke, with the way this particular rating system works (not that I agree with this), it basically thinks Fanatic is equal to you in strength, because the only data on fanatic is achieving a 50% result against you (1 win, 1 loss). I believe the 4 rating point difference is because when he beat you, he had the second pick in distribution, but when you beat him, you had the first pick. So his win gets slightly more credit.

Skewed rating results: 2/24/2011 00:05:00 
Fizzer
Level 57
WarLight Creator
Report

The algorithm does very poorly when it has little amounts of data. Your rating is pretty accurate, but WL Fanatic's is inaccurate.
Wins don't "cancel out"  instead, all it knows about WL Fanatic is that he's beaten and lost to you, so it would put him at the same rating as you. As yours changes, his would too. You would both be tied at 1891 rating points. The only reason he's slightly ahead is because you got first pick in both games which gave him a very small advantage which put him just above you.
This makes me think that maybe ratings should be hidden during the provisional period.

Skewed rating results: 2/24/2011 00:06:38 
Fizzer
Level 57
WarLight Creator
Report

I verified this by running the current results without first pick advantage, and here are the results:
1 Fizzer 1949
2 Elucidar 1932
3 TheImpaller 1918
4 bostonfred 1910
5 Duke 1891
6 WLFanatic 1891

Skewed rating results: 2/24/2011 00:25:56 
Duke
Level 5
Report

That's not how it's supposed to work (or how USCF rankings work). If I played my first game and somehow beat a master (2000+) I wouldn't suddenly have his ranking. I would see a significant gain, but my rating would jump from 1500 to maybe 1700. You get a huge amount of flux when you first start since you haven't settled into a baseline, but the formula doesn't assign 100% of the points of each player you beat  that just won't work.
It should be a formula that discounts the winning player's gain or loss by an amount calculated off the expected result to mitigate these crazy results. So if I'm ranked 1600 and I lose to soemone ranked 1500, they would go up to 1550. If I win the game I'd go up a smaller amount (since that was the expected outcome), like 20 points, so I'd go to 1620. But the result is also supposed to weight total games played by each player. So 20 points woudl represent the maximum I could go up with a win and would be reduced by a percentage based on how many games I've played.
Even someone rated 2200 would go up something for beating an 1600, but it would be very very small. If they had 200 games it would go up slightly more than if they had 1000 games  because the more games you've played the more "established" your ranking.
But the starting formulas appear wrong. Please post the actual formulas and some examples  because something is clearly not right.

Skewed rating results: 2/24/2011 00:27:21 
Duke
Level 5
Report

I wrote that response before your posts Fizz. I still think somethings wrong with giving the winner 1005 fo the loser's points, but I appreciate the prompt thoughtful response (as always).

Skewed rating results: 2/24/2011 00:35:07 
crafty35a
Level 3
Report

Right, the Bayesian Elo formula currently used here is much, much different than the more typical system used by the USCF (which I agree would be more appropriate for WL).

Skewed rating results: 2/24/2011 00:42:35 
Fizzer
Level 57
WarLight Creator
Report

Part of the problem here is that I presented it wrong. You don't start at 1500 and move up/down. I'm going to change it in the next release so players with 0 games have a rating of 0  having 1500 is just plain wrong.
He didn't "gain 1005"  the system guessed his rating to be about 1885 based on what it knows.

Skewed rating results: 2/24/2011 00:48:41 
Duke
Level 5
Report

http://math.bu.edu/people/mg/ratings/approx/approx.html
does a good job of explaining the old and new USCF system. Thanks for that explanation Fizz. It's not a ranking per se, just an estimated ranking. Ranking would be a different figure. The program would hold that guess until it got enough data to confirm it, but the actual ratings and the estimated ranking could be very different values.

Skewed rating results: 2/24/2011 01:01:39 
Duke
Level 5
Report

That "1005" was a typo, it's supposed to read "100% of the loser's points".

Skewed rating results: 2/24/2011 01:54:00 
WL Fanatic
Level 8
Report

Because you've won so many, my loss doesn't take much away and my win gets a hell of a lot. In a day or two the three games I'm doing now should finish up and things will probably balance out. Hell, If I lost to you 5 times and you had stats of 6 : 1 I'd probably gain points :P

Skewed rating results: 2/24/2011 03:47:08 
The Impaller
Level 9
Report

This Bayesian system is purely designed to rank players in order from best to worst based on the information it sees. How many points you have isn't meant to be like an actual ranking or rating that can be viewed in isolation. It's 100% relevant to the other players around you.
It's not something where one person can be like "Ooh, a 2000 rating, that's really high! You must have had to beat 30 people to get that high" but rather the system could place you at that rating after only one game, if your one game was a win against someone who has won against everyone else. In that situation, the system may deem you to be the best player out of everyone and so it award you as many points are necessary so that you are on top of the point table. After 2 more games that you lose, the system would then determine that you're not the best player and you would drop down to like 5th place or something.

Skewed rating results: 2/24/2011 15:44:05 
Duke
Level 5
Report

I just don't like that the system should deem you the best player out of everyone if you only have one win against the best player (currently you), while the best player might have 30 wins and only that one losses.
I'm not saying I don't understand the math, I'm proposing tweaking the system to take into efefct volume of wins, not your win against the highest ranked player you happen to have played.
Would beating everyone on the ladder beworth less than beating you right now? Should it be?

Skewed rating results: 2/24/2011 16:04:30 
crafty35a
Level 3
Report

Duke, I don't think there's any way around that with the current Bayesian system in place. That's why Fizzer implemented the provisional period (no ranking before 10 games completed). That said, I know there are some of us who would like to see the Bayesian system replaced with a more typical Elo rating system, which wouldn't have this weirdness. Here are a few reason why I believe this should be done:
 Bayesian system rewards players improperly if their past opponents improve in the future
 Bayesian system is not fair to rapidly improving players, because all games have the same weight, regardless of when they occurred (it works this way because it was designed to rate chess playing AI programs, not people  AI programs don't improve, so it rightly assumes that all the games have the same meaning)
 Bayesian system is confusing in that your rating can change (sometimes drastically) even if you haven't played any games recently
I'm sure there are more things that I am forgetting, but those are my major concerns.

Skewed rating results: 2/24/2011 17:38:03 
The Impaller
Level 9
Report

Crafty is right that the provisional period would take care of that anomaly. The system may put you at number 1 for beating someone who was currently number one, but during the provisional period that won't be shown. Then you have 9 other games to potentially lose and lose rating from, so by the time your rating is official, you won't be ahead of 291 guy unless you're 100 and even then it would come down to quality of opponents. What jumping you to the top immediately after beating 291 guy as your first game DOES do that is positive is it will start pairing you against the other top players, so it more quickly moves you to a place where you might be more adequately matched, than getting 40 points from a normal ELO system would do.
I don't know, I'm not sold on this algorithm yet, but I'm not opposed to it either without seeing how it settles out first.

Skewed rating results: 2/24/2011 17:51:34 
Perrin3088
Level 44
Report

the thing is, after the initial players get settled into their positions, new players will join and play people closer to their level, so there won't be any massive jumps anyways.. they will beat someone at 1540 when they are 1500, and thus gain a boost..
the anomolies we are getting currently is due to the fact that new players are playing against people that are actually much higher/lower then they should be, because initially we were all 1500, and that meant the initial matchups were flawed, best player vs worst player, so the system only knows that the worst is worst then the best, which could make him better then an average player that beat another average player, so until further data is added, then it assumes that the worst player is quite good, especially when the best player wins several games before the worst player gets another game completed

Skewed rating results: 2/24/2011 21:56:00 
Math Wolf
Level 62
Report

As a statistician, I do see where the even further underlying problem lays.
Bayesian statistics basically use a formula to calculate the posterior distribution (the scores in this case) based on the prior distribution and the data.
The data are known of course so they do not pose a problem.
However, the prior distribution needs to be arbitrarily chosen. There are infinite ways to choose this prior distribution, but most of these don't make any sense.
For example: why would you give Duke a priori a higher score than WL Fanatic if neither of them played a game? (bad example, but you get my point).
So, in most cases, an uninformative distribution is chosen so that there is minimal risk that the posterior distribution is biased.
However, in many cases, this gives problems if there is only limited data.
When there is one datapoint available, this point will be the result for your posterior. When there are two, the (weighted) mean of those two will be used, and so on.
This is basically what is happening here.
How can this be fixed? Add information and make the prior informative.
For example, when a player joins the ladder, assume this player has 2 wins and 2 losses against a fictional 1500 opponent (prior distribution).
When this player has then played his first match, this result will only count for 1/5 of his rating and thus the result will be biased towards 1500.
After several games (20+ let's say), these initial games will hardly count and the ranking will be consequent.
Of course, since the absolute result doesn't count, but rather the relative strength, this small bias doesn't pose a problem.
Therefore, I'd advise to use the method described above to 'fix' the problem. Whether to add 1 win and 1 loss or 2 wins and 2 losses can be discussed, in statistics when working with a binomial, often 2 wins, 2 losses (2 successes and 2 failures for a prior probability of 0.5) are used.
Hope this helps.

Skewed rating results: 2/24/2011 22:25:35 
crafty35a
Level 3
Report

Interesting stuff, MathWolf. I agree that what you propose would probably make things work a bit better with this particular system, at least for a player's first few games. Are you familiar with Elo rating systems at all? Because what you propose essentially will make this Bayesian system behave much more like a typical Elo system for a player's first few games (where everyone starts at 1500, and the ratings are not nearly as volatile as they are with Bayesian elo).

Skewed rating results: 2/24/2011 22:37:41 
Ruthless
Level 36
Report

Perrin  I don't think that example works
"
b=best player
w=worst player
cd=average players
Assume the following are true:
b>w
c=d
b>c
b>d
then
w>c
w>d
"
You would need more information to determine that W>c and d. Just because B>W, doesn't necessarily mean W is better than c or d.

Skewed rating results: 2/24/2011 23:37:34 
Perrin3088
Level 44
Report

that's the point ruthless, it makes presumptions based purely on who you're matched up on..
it takes the fact that b is better then c/d, and assumes that since w was matched to b, but not c/d that w must be better then c/d, and closer to b. until further evidence is provided.
as time would tell when w plays c/d and losses, then it would place them properly...
another downfall i see, is if you manage to get a falsely high ranking, then even though you continue to lose against incredibly high rating scores, you'll drop slowly, but it will still think you deserve a high*er* ranking because you were matched up with higher rating players.
Imho, once it stabilizes some, new players should play the first ten games, as if they had 1500 ratings.. ie, against average players, *preferably average vs new, and not new vs new* so that way they will have 10 w/l's against members of the average difficulty level of the ladder, instead of incidentally getting high/low rating w/l's based mostly on whoever their first matchup was

Skewed rating results: 2/24/2011 23:43:48 
Ruthless
Level 36
Report

Ah, I didn't see that your example was an example to point out the flaws in the system. I thought you were trying to prove the opposite which is how i got confused. That makes sense now.

Skewed rating results: 2/25/2011 00:33:08 
Fizzer
Level 57
WarLight Creator
Report

That's an interesting idea, MathWolf. I'll have to experiment with that  thanks!

Skewed rating results: 2/25/2011 02:26:31 
NoZone
Level 6
Report

The 'cheat' proposes by MathWolf looks pretty compelling.
As an aside, what distribution of players ranks should be expected from this ranking system. Won't it be normal? It'd be interesting to know how the distribution differs from the distribution of ranks in a typical ELO scoring system.
NoZone

Skewed rating results: 2/25/2011 02:57:17 
crafty35a
Level 3
Report

One thing I have yet to see in this discussion is anyone arguing that the Bayesian system is actually better/preferable than a more standard Elo system. MathWolf's hack should make things work a bit more smoothly, but taking the long view for a second here: can anyone tell me why we should prefer to use the Bayesian system? I think I've put forward a pretty strong argument that it is inappropriate for human players, and I've really yet to hear anyone argue otherwise.

Skewed rating results: 2/25/2011 03:19:52 
The Impaller
Level 9
Report

I don't see it being necessarily worse. I want to let it run its course for a while. If that doesn't work, then I'm sure Randy will switch to something else, but I don't think people are giving it enough of a chance and are condemning it too fast because it's something they are unfamiliar with or something that doesn't necessarily make immediate intuitive sense.
I can think of a few downsides to standard ELO systems. It can be advantageous to play tons of games to inflate or push your rating really high in some situations and in other situations it can be advantageous to play as few games as you possibly can to prevent losing rating points. Standard ELO systems allow people to inflate their rating by grinding a ton of games until they can win enough in a row, and then sit on that rating by playing as few games as they can. In a competitive game I play that uses an ELO system, this is commonly done, because certain rating levels award you byes or invites to various events. So you can grind a bunch of events until you win enough to get above a certain threshold and then just sit on that rating long enough to get whatever invite or bye you are looking to get from it.
I don't think the Bayesian system has that kind of downside, because it doesn't matter when your wins come, or what order they come in. In the Bayesian system, I also don't think you will have to play a lot of games in order to reach a high enough rating, whereas in a standard ELO system, you may have to play 20+ games to even be in the situation where you have a shot of getting to the top of rating, because rating gain is a much slower process, since there are a fixed number of points you can win.
I think the Bayesian system has a lot of potential. I was down on it at first because it was unfamiliar and was producing really awkward results with low data, but I think with more and more data being pumped in we're going to see very smooth rankings that fairly accurately reflect the skill of the players in the ladder. Lets give it a shot.

Skewed rating results: 2/25/2011 03:50:03 
Fizzer
Level 57
WarLight Creator
Report

I share Impaller's feelings.
The thing that I hate about standard ELO is how someone's current rating affects you a lot, even if they aren't properly rated yet. For example, say Gaia joins the tournament tomorrow and defeats you. Gaia is a very strong player, but her rating was only 1500 since she just joined, so you take a big hit for losing to a 1500 player.
Now say Gaia rises to the top 5  does it really make sense for everyone she beat on the way there to take a big penalty to their rating? The Bayesian system would return those points to them as she went up the ladder, since it recognized that they really lost to a stronger player.
The argument that player's skills change over time is valid, but that's solved in the long term by games expiring after 3 months. If you're better next year than you are now, the ladder will reflect what your skill is next year  today's games won't affect it at all.
I agree, however, that it's flawed if your skills change rapidly, such as over a few days. The ladder really isn't designed for complete newbies  it's designed to be the ultimate competitive arena. If you're still developing your skills, I recommend playing some practice games. The 1v1 auto games have been around for a long time and have given out a ton of practice to lots of players on these settings.

Skewed rating results: 2/25/2011 05:25:08 
crafty35a
Level 3
Report

"The thing that I hate about standard ELO is how someone's current rating affects you a lot, even if they aren't properly rated yet. For example, say Gaia joins the tournament tomorrow and defeats you. Gaia is a very strong player, but her rating was only 1500 since she just joined, so you take a big hit for losing to a 1500 player."
There are some pretty simple ways around that, though. Here is one that I like: new players have a provisional rating until they complete X number of games. While provisionally rated, the new player's rating changes, but their opponents' ratings do not change. Once they are out of the provisional period, the expectation is that their rating will be close to their "true" rating. Since they will be expected to maintain a similar rating to their first true rating, there is no inflation/deflation introduced to the system.
I believe the USCF uses something very similar to this, currently, but it's been a long time since I played so I will have to do some research to verify that.

Skewed rating results: 2/25/2011 05:37:57 
crafty35a
Level 3
Report

Actually, I think I just thought of a nice addition to the system I describe above:
 Player exits the provisional period (completes X number of games)
 At this point, you have a good estimate of his true rating. So now you can adjust his past opponents' ratings accordingly.

Post a reply to this thread
Before posting, please proofread to ensure your post uses proper grammar and is free of spelling mistakes or typos.

