<< Back to Ladder Forum | Discussion is locked - replying not allowed   Search

Posts 1 - 20 of 50   1  2  3  Next >>   
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 19:54:50


crafty35a 
Level 3
Report
First of all, we need a shorthand way to refer to this feature, that's quite a mouthful!

Now, I know I was personally quick to argue that I don't think this method makes sense when it was originally announced in a blog post. But I thought it would be good to get some discussion going, and I just want to provide one example of the odd results you can get with this method.

Currently, NoZone is ranked 4th on the ladder with a 1630 rating. His only result is a loss to Fizzer. Because Fizzer has since achieved a high rating (number 1 on the ladder, currently), this retroactive ratings adjustment seems to be giving NoZone a lot of credit for that loss.

Does this make sense to anyone? I don't think you should ever gain rating points for a loss (which is the way it works in a typical ELO system).

(By the way, sorry to single you out, NoZone. You may indeed be a great 1v1 player for all I know, I'm just pointing out the absurdity of the rating method with your current results)
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 20:03:12

Fizzer 
Level 64

Warzone Creator
Report
I agree that situation looks suspect. I'm working on a system now that will increase transparency in how the ratings are calculated.

This will help us understand what's going on more, which is the first step in deciding if we want to change to something else. I'm completely open to changing to a different rating system, but I think we should give it at least a week to settle down. I'm sure I won't be on the top for long :)
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 20:31:21


Ruthless 
Level 57
Report
I think with any system that can accommodate a huge amount of people with a bunch of data to crunch is going to be very choppy when starting out. I think the system doesn't have enough data yet to "truly" show the correct standings. Like Randy said, lets give it a couple weeks with a lot more games under it's belt.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 21:47:58

Fizzer 
Level 64

Warzone Creator
Report
I agree with Ruthless in that results are very choppy now and it will settle down over time. But there are a couple surprising outcomes.

Let's pick on Blue Precision and NoZone.

BP is undefeated - he has 1 win and 0 losses, but his win is against the lowest player (who is 0 for 5.) As a result, Bayeselo gave him almost nothing for the win and gave a rating of 1334.

NoZone has no wins - he's 0 for 2. But his losses are against very good players (#1 and #2 rank.) As a result, Bayeselo didn't hold these losses against him and gave him a rating of 1584.

I'm surprised that an unvictorious player can be ranked above a undefeated player. I've looked into a bit, and as far as I can tell this is the expected result of Bayeselo's algorithm (I verified that first pick advantage is not causing this, and that all the games are being accounted for, and that the correct winner is being input for each game.)

I'm going to continue to investigate more.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 22:25:57


NoZone 
Level 6
Report
Hello,
No worries about bringing that up, I noted the same thing with some amusement. 0-2 resulting in an improved rating seemed odd. I think where it comes from is that since these were some of the first games played we were all equally ranked at 1500. So a win/loss doesn't do much directly but the subsequent games have an adverse effect once there are significant differential scores in the mix. I think this is only an issue from this initial phase where everyone is on paper equally ranked with 1500. As soon as a few more games pass, I think it will settle out to something more realistic. Especially if there is the expiration on the past game effects as mentioned elsewhere.
NoZone
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 23:10:45

The Impaller 
Level 9
Report
I agree that this is pretty weird. I'd say give it a chance to balance out after a few weeks or so and see how it looks then, before calling for the pitchforks and torches.

I do think it's odd that you can gain points from a loss and lose points from a win. Any of the ELO systems I've seen are generally set up such that there are a fixed number of points that can be gained or lost in a match (say 40) and it's split based on the relative rankings of the players. So if someone is 200 points higher than another player and they win, they'll get 10 points and the other player will lose 10 points. But if they are even, one player gains 20 and the other loses 20. If it's a 400 point differential and the higher ranked player wins, they may gain only 1 point and the loser only lose 1 point. But if the lower ranked player were to win, they may gain 39 points and the higher ranked player lose 39. Something along those lines is what I've experienced in any of the sites/games I've played that use a similar system.

I did find it odd that BP was ranked so low. My first thought was "I wonder who he lost to" and then I clicked to find out that he hadn't lost to anyone.

I think that this Bayesian system isn't designed the way we normally expect, which is to award points for wins and take away points for losses, but rather designed purely to format rankings accurately based on true skill. So it may seem weird that it's doing this but it might also balance out to be a much better system in the long run once more data has been collected.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 23:40:45


crafty35a 
Level 3
Report
Right on, Impaller, that's how all ELO systems I've seen in the past work. It looks like the game vs. NoZone was actually Fizzer's last completed game. So the only way his rating changed in the meantime was due to retroactive adjustments. Once those adjustments are made to Fizzer's rating, does that then also change the rating of all of his past opponents? If so, where does the chain end? It could go on forever!
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 23:49:05


NecessaryEagle 
Level 59
Report
I think what the problem is is the way the retroactive scheme is set up. it seems that while NoZone lost, he lost points, but then his opponents rank went up, and instead of just re-doing the point change from the NoZone game, it transfered extra points to Nozone which was a higher amount that what he initially lost. in other words, the system should be based on percentages instead of numbers.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/20/2011 23:58:03

The Impaller 
Level 9
Report
I don't think the chain goes on forever. Rather it goes back to the beginning, which in this case will always be 3 months, since ratings are only calculated for the last 3 months of play. This means every time ratings are updated, they have to be recalculated for every player taking into consideration every ladder game in the last 3 months. Fizzer mentioned somewhere about it could take a hour or more to run that calculation and I'm not surprised, because that could potentially be a lot of data to run through.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 01:43:08

Fizzer 
Level 64

Warzone Creator
Report
This post explains how to run your own ladder simulations:

http://blog.warlight.net/index.php/2011/02/running-your-own-ladder-simulations/

This is useful for understanding the ratings.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 03:22:14


Perrin3088 
Level 49
Report
the problem in all likeliness will never be solved for middle grounds players.. any new players in the ladder would affect the current players, and be affected themselves similarly to how Nozone and BP are currently being affected. Say you're an average player at 1500 score with a history, then someone new joins that does sub par, and losses his first 7 games.. all of the sudden, with an average score, playing people only as good as you *1500 ish* your rank drops drastically due to it being unable to correctly place the newcomer...

the retroactive ratings would probably work better for people that have history.. IE, anyone that hasn't been in the ladder for at least a month/X games, the games with them are done without retroactive ratings enabled..
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 03:48:52


crafty35a 
Level 3
Report
Well I've been playing around with the rating tool for a while, and I think I've nailed down what the issue is. I believe this tool is tailored towards calculating ratings for a set of players that each have a constant, unchanging playing strength. Why do I think this? First of all, notice the names of the "players" listed in the provided examples (http://remi.coulom.free.fr/Bayesian-Elo): Comet B.68, Dragon 4.7.5, Gandalf 4.32h, etc. These are all fairly well known chess engines (essentially AI programs that play chess).

Logically, it would absolutely make sense to retroactively adjust ratings based on the future performance of opponents, if the "players" were actually specific versions of chess engines. Why? Because these engines have a constant, unchanging strength level. Say a chess engine plays one game today, and another 99 games over the next six months. If we want to calculate the strength of the chess engine at the time of the first game played, every single one of the 100 games should be considered with equal weighting, because *the strength of a chess engine does not change over time*!

But with a human player, that of course is not true. I think this is the fundamental flaw with using this method to rate human players.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 04:51:07


NecessaryEagle 
Level 59
Report
and why is NoZone's rank higher than FBG-Dragon's?
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 04:52:37


NecessaryEagle 
Level 59
Report
the way it looks right now is that loosing to a good opponent is better than wining against a bad opponent, so if your first couple games dropped you on the ratings, then it's harder to rise because you don't get placed with higher players
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 05:33:51

The Impaller 
Level 9
Report
It does seem to be that way, however that may be corrected as it rediscovers who is a good player and so forth.
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 05:51:03


Perrin3088 
Level 49
Report
it seems to me like the early games modify your rating too much.. IE, you shouldn't be able to drop to 1300/raise to 1700 in just a couple of games.. so as to keep new players more average until their real potential is proven..
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 07:15:12


Perrin3088 
Level 49
Report
Fizzer, why didn't you just put
offset 1500
on the page you linked us so it would automatically show the warlight rating?
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 07:24:21

Fizzer 
Level 64

Warzone Creator
Report
I never noticed the offset command. I'll add that in - thanks!
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 07:36:36


Perrin3088 
Level 49
Report
I also think that when we get a more established ladder, to solve my earlier fear, *3 posts up i think* we could implement a removerare X command before the elo command... it would make it so that new players would have to get at least X games before they influence the ladder, which imho could help keep the average range of players more steady... but ofc' idk, it will always be partly unsteady as long as new people come in so..
Discussion of the "retroactive rating updates based on opponents' future results" feature: 2/21/2011 07:43:52

Fizzer 
Level 64

Warzone Creator
Report
Perrin: I was just thinking the same thing. I was thinking it would be good even now - if the rankings that are displayed now are meaningless, they shouldn't be displayed at all. It's only causing mass panic and confusion.

I think it would be good to hide ranks until you've completed a certain number of games.
Posts 1 - 20 of 50   1  2  3  Next >>   
Discussion is locked - replying not allowed