<< Back to Ladder Forum   Search

Posts 11 - 30 of 36   <<Prev   1  2  Next >>   
Skewed rating results: 2/24/2011 01:54:00

WL Fanatic 
Level 8
Report
Because you've won so many, my loss doesn't take much away and my win gets a hell of a lot. In a day or two the three games I'm doing now should finish up and things will probably balance out. Hell, If I lost to you 5 times and you had stats of 6 : 1 I'd probably gain points :P
Skewed rating results: 2/24/2011 03:47:08

The Impaller 
Level 9
Report
This Bayesian system is purely designed to rank players in order from best to worst based on the information it sees. How many points you have isn't meant to be like an actual ranking or rating that can be viewed in isolation. It's 100% relevant to the other players around you.

It's not something where one person can be like "Ooh, a 2000 rating, that's really high! You must have had to beat 30 people to get that high" but rather the system could place you at that rating after only one game, if your one game was a win against someone who has won against everyone else. In that situation, the system may deem you to be the best player out of everyone and so it award you as many points are necessary so that you are on top of the point table. After 2 more games that you lose, the system would then determine that you're not the best player and you would drop down to like 5th place or something.
Skewed rating results: 2/24/2011 15:44:05


Duke 
Level 5
Report
I just don't like that the system should deem you the best player out of everyone if you only have one win against the best player (currently you), while the best player might have 30 wins and only that one losses.

I'm not saying I don't understand the math, I'm proposing tweaking the system to take into efefct volume of wins, not your win against the highest ranked player you happen to have played.

Would beating everyone on the ladder beworth less than beating you right now? Should it be?
Skewed rating results: 2/24/2011 16:04:30


crafty35a 
Level 3
Report
Duke, I don't think there's any way around that with the current Bayesian system in place. That's why Fizzer implemented the provisional period (no ranking before 10 games completed). That said, I know there are some of us who would like to see the Bayesian system replaced with a more typical Elo rating system, which wouldn't have this weirdness. Here are a few reason why I believe this should be done:

- Bayesian system rewards players improperly if their past opponents improve in the future
- Bayesian system is not fair to rapidly improving players, because all games have the same weight, regardless of when they occurred (it works this way because it was designed to rate chess playing AI programs, not people -- AI programs don't improve, so it rightly assumes that all the games have the same meaning)
- Bayesian system is confusing in that your rating can change (sometimes drastically) even if you haven't played any games recently

I'm sure there are more things that I am forgetting, but those are my major concerns.
Skewed rating results: 2/24/2011 17:38:03

The Impaller 
Level 9
Report
Crafty is right that the provisional period would take care of that anomaly. The system may put you at number 1 for beating someone who was currently number one, but during the provisional period that won't be shown. Then you have 9 other games to potentially lose and lose rating from, so by the time your rating is official, you won't be ahead of 29-1 guy unless you're 10-0 and even then it would come down to quality of opponents. What jumping you to the top immediately after beating 29-1 guy as your first game DOES do that is positive is it will start pairing you against the other top players, so it more quickly moves you to a place where you might be more adequately matched, than getting 40 points from a normal ELO system would do.

I don't know, I'm not sold on this algorithm yet, but I'm not opposed to it either without seeing how it settles out first.
Skewed rating results: 2/24/2011 17:51:34


Perrin3088 
Level 49
Report
the thing is, after the initial players get settled into their positions, new players will join and play people closer to their level, so there won't be any massive jumps anyways.. they will beat someone at 1540 when they are 1500, and thus gain a boost..

the anomolies we are getting currently is due to the fact that new players are playing against people that are actually much higher/lower then they should be, because initially we were all 1500, and that meant the initial matchups were flawed, best player vs worst player, so the system only knows that the worst is worst then the best, which could make him better then an average player that beat another average player, so until further data is added, then it assumes that the worst player is quite good, especially when the best player wins several games before the worst player gets another game completed
Skewed rating results: 2/24/2011 17:56:42


Perrin3088 
Level 49
Report
b=best player
w=worst player

c-d=average players

Assume the following are true:
b>w
c=d
b>c
b>d

then
w>c
w>d



this is assuming the best player has beat the 2 average players, and the worst player, and the two average players have each beat each other once, while the worst player has only 1 loss, to the best player.
Skewed rating results: 2/24/2011 21:56:00


Math Wolf 
Level 64
Report
As a statistician, I do see where the even further underlying problem lays.

Bayesian statistics basically use a formula to calculate the posterior distribution (the scores in this case) based on the prior distribution and the data.
The data are known of course so they do not pose a problem.

However, the prior distribution needs to be arbitrarily chosen. There are infinite ways to choose this prior distribution, but most of these don't make any sense.
For example: why would you give Duke a priori a higher score than WL Fanatic if neither of them played a game? (bad example, but you get my point).

So, in most cases, an uninformative distribution is chosen so that there is minimal risk that the posterior distribution is biased.
However, in many cases, this gives problems if there is only limited data.
When there is one datapoint available, this point will be the result for your posterior. When there are two, the (weighted) mean of those two will be used, and so on.
This is basically what is happening here.

How can this be fixed? Add information and make the prior informative.

For example, when a player joins the ladder, assume this player has 2 wins and 2 losses against a fictional 1500 opponent (prior distribution).
When this player has then played his first match, this result will only count for 1/5 of his rating and thus the result will be biased towards 1500.
After several games (20+ let's say), these initial games will hardly count and the ranking will be consequent.
Of course, since the absolute result doesn't count, but rather the relative strength, this small bias doesn't pose a problem.

Therefore, I'd advise to use the method described above to 'fix' the problem. Whether to add 1 win and 1 loss or 2 wins and 2 losses can be discussed, in statistics when working with a binomial, often 2 wins, 2 losses (2 successes and 2 failures for a prior probability of 0.5) are used.

Hope this helps.
Skewed rating results: 2/24/2011 22:25:35


crafty35a 
Level 3
Report
Interesting stuff, MathWolf. I agree that what you propose would probably make things work a bit better with this particular system, at least for a player's first few games. Are you familiar with Elo rating systems at all? Because what you propose essentially will make this Bayesian system behave much more like a typical Elo system for a player's first few games (where everyone starts at 1500, and the ratings are not nearly as volatile as they are with Bayesian elo).
Skewed rating results: 2/24/2011 22:37:41


Ruthless 
Level 57
Report
Perrin -- I don't think that example works

"
b=best player
w=worst player

c-d=average players

Assume the following are true:
b>w
c=d
b>c
b>d

then
w>c
w>d
"

You would need more information to determine that W>c and d. Just because B>W, doesn't necessarily mean W is better than c or d.
Skewed rating results: 2/24/2011 23:37:34


Perrin3088 
Level 49
Report
that's the point ruthless, it makes presumptions based purely on who you're matched up on..

it takes the fact that b is better then c/d, and assumes that since w was matched to b, but not c/d that w must be better then c/d, and closer to b. until further evidence is provided.


as time would tell when w plays c/d and losses, then it would place them properly...

another downfall i see, is if you manage to get a falsely high ranking, then even though you continue to lose against incredibly high rating scores, you'll drop slowly, but it will still think you deserve a high*er* ranking because you were matched up with higher rating players.

Imho, once it stabilizes some, new players should play the first ten games, as if they had 1500 ratings.. ie, against average players, *preferably average vs new, and not new vs new* so that way they will have 10 w/l's against members of the average difficulty level of the ladder, instead of incidentally getting high/low rating w/l's based mostly on whoever their first match-up was
Skewed rating results: 2/24/2011 23:43:48


Ruthless 
Level 57
Report
Ah, I didn't see that your example was an example to point out the flaws in the system. I thought you were trying to prove the opposite which is how i got confused. That makes sense now.
Skewed rating results: 2/25/2011 00:19:58


Perrin3088 
Level 49
Report
Mathwolf, i agree wholeheatedly with the fictional 1500 opponent for new players.. it is essentially the reason i am pushing for 1500 base rating for a players probationary 10 games, as i couldn't figure out a way, via the program fizzer is currently using, to keep the Rating more stable initially while still letting them keep rating changes for early games before passing the probationary period.
Skewed rating results: 2/25/2011 00:33:08

Fizzer 
Level 64

Warzone Creator
Report
That's an interesting idea, MathWolf. I'll have to experiment with that - thanks!
Skewed rating results: 2/25/2011 02:26:31


NoZone 
Level 6
Report
The 'cheat' proposes by MathWolf looks pretty compelling.

As an aside, what distribution of players ranks should be expected from this ranking system. Won't it be normal? It'd be interesting to know how the distribution differs from the distribution of ranks in a typical ELO scoring system.

NoZone
Skewed rating results: 2/25/2011 02:57:17


crafty35a 
Level 3
Report
One thing I have yet to see in this discussion is anyone arguing that the Bayesian system is actually better/preferable than a more standard Elo system. MathWolf's hack should make things work a bit more smoothly, but taking the long view for a second here: can anyone tell me why we should prefer to use the Bayesian system? I think I've put forward a pretty strong argument that it is inappropriate for human players, and I've really yet to hear anyone argue otherwise.
Skewed rating results: 2/25/2011 03:19:52

The Impaller 
Level 9
Report
I don't see it being necessarily worse. I want to let it run its course for a while. If that doesn't work, then I'm sure Randy will switch to something else, but I don't think people are giving it enough of a chance and are condemning it too fast because it's something they are unfamiliar with or something that doesn't necessarily make immediate intuitive sense.

I can think of a few downsides to standard ELO systems. It can be advantageous to play tons of games to inflate or push your rating really high in some situations and in other situations it can be advantageous to play as few games as you possibly can to prevent losing rating points. Standard ELO systems allow people to inflate their rating by grinding a ton of games until they can win enough in a row, and then sit on that rating by playing as few games as they can. In a competitive game I play that uses an ELO system, this is commonly done, because certain rating levels award you byes or invites to various events. So you can grind a bunch of events until you win enough to get above a certain threshold and then just sit on that rating long enough to get whatever invite or bye you are looking to get from it.

I don't think the Bayesian system has that kind of downside, because it doesn't matter when your wins come, or what order they come in. In the Bayesian system, I also don't think you will have to play a lot of games in order to reach a high enough rating, whereas in a standard ELO system, you may have to play 20+ games to even be in the situation where you have a shot of getting to the top of rating, because rating gain is a much slower process, since there are a fixed number of points you can win.

I think the Bayesian system has a lot of potential. I was down on it at first because it was unfamiliar and was producing really awkward results with low data, but I think with more and more data being pumped in we're going to see very smooth rankings that fairly accurately reflect the skill of the players in the ladder. Lets give it a shot.
Skewed rating results: 2/25/2011 03:50:03

Fizzer 
Level 64

Warzone Creator
Report
I share Impaller's feelings.

The thing that I hate about standard ELO is how someone's current rating affects you a lot, even if they aren't properly rated yet. For example, say Gaia joins the tournament tomorrow and defeats you. Gaia is a very strong player, but her rating was only 1500 since she just joined, so you take a big hit for losing to a 1500 player.

Now say Gaia rises to the top 5 - does it really make sense for everyone she beat on the way there to take a big penalty to their rating? The Bayesian system would return those points to them as she went up the ladder, since it recognized that they really lost to a stronger player.

The argument that player's skills change over time is valid, but that's solved in the long term by games expiring after 3 months. If you're better next year than you are now, the ladder will reflect what your skill is next year - today's games won't affect it at all.

I agree, however, that it's flawed if your skills change rapidly, such as over a few days. The ladder really isn't designed for complete newbies - it's designed to be the ultimate competitive arena. If you're still developing your skills, I recommend playing some practice games. The 1v1 auto games have been around for a long time and have given out a ton of practice to lots of players on these settings.
Skewed rating results: 2/25/2011 05:25:08


crafty35a 
Level 3
Report
"The thing that I hate about standard ELO is how someone's current rating affects you a lot, even if they aren't properly rated yet. For example, say Gaia joins the tournament tomorrow and defeats you. Gaia is a very strong player, but her rating was only 1500 since she just joined, so you take a big hit for losing to a 1500 player."

There are some pretty simple ways around that, though. Here is one that I like: new players have a provisional rating until they complete X number of games. While provisionally rated, the new player's rating changes, but their opponents' ratings do not change. Once they are out of the provisional period, the expectation is that their rating will be close to their "true" rating. Since they will be expected to maintain a similar rating to their first true rating, there is no inflation/deflation introduced to the system.

I believe the USCF uses something very similar to this, currently, but it's been a long time since I played so I will have to do some research to verify that.
Skewed rating results: 2/25/2011 05:37:57


crafty35a 
Level 3
Report
Actually, I think I just thought of a nice addition to the system I describe above:
- Player exits the provisional period (completes X number of games)
- At this point, you have a good estimate of his true rating. So now you can adjust his past opponents' ratings accordingly.
Posts 11 - 30 of 36   <<Prev   1  2  Next >>