Warzone

<< Back to Ladder Forum

Posts 1 - 20 of 43 1 2 3 Next >>

VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 03:34:07

Fizzer

Level 64

Warzone Creator
Report

Please vote on whether or not you would like to see WarLight switch from the current Bayesian ELO model to a traditional ELO model.

# [Click here to vote](/Poll.aspx?ID=ELO)

Since this poll pertains to the ladders, only WarLight members may vote. Anyone can view the poll, it just won't accept your vote if you're not a member.

The poll will close on March 27th. You have until then to make up your mind. The voting link above allows you to change your vote at any time before then (simply click the link and vote again and your old vote will be thrown away.)

This poll is not anonymous. If you'd like, you can leave a reply to this thread explaining what you voted and why.

VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 04:43:41
Knoebber Level 55 Report	I like having my rating fluctuate even if I haven't completed a game recently. So I say keep it.

VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 05:35:55

Doushibag

Level 17
Report

How does the current system relate to this one: http://en.wikipedia.org/wiki/TrueSkill
Same thing? Differences?
Curious as to what the different alternatives are and if there are any systems that are specifically designed for a higher luck game as opposed to a zero luck game like chess. IE Warlight is going to have more 'upsets' due to the luck factor than chess does and if any system can help account for this factor (if its necessary... not sure it is, but seems like a relevant difference).
I don't like the current system, but not sure I'd like standard ELO either and if I didn't like either not sure which I'd prefer at this point or if there was some other alternative to both that would work better. Guess I should go read again why you chose to use the current one over standard ELO. Not sure if it was because you felt standard ELO was notably flawed or just that this one was worth testing out as a possible better alternative.

VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 06:16:11

TeddyFSB

Level 60
Report

Here are the main differences between standard ELO, ELO as used by Warlight and TrueSkill formula:

Standard ELO: you approach mu (your true skill level) in fairly small incremental skills, thus it takes some number of games until you reach mu

Bayesian ELO: mu is evaluated at every step via a maximum likelihood fit, taking all available information into account

TrueSkill: your skill level is determined as mu-n*sigma, so if you are a good player, you take a penalty if you have played a small number of games. At n=3, as used by the Xbox system, the penalty is so huge, I think it would be impractical for Warlight, however a smaller n could be considered as a solution against stalling lost games.

To demonstrate the effect of n=3, if used right now, it would result in Doushibag dropping from 1st to 10th. Shogun, who has played 12 games, would drop from 6th to 32nd. Top 5 would be (listing old ranking and number of games played):

1.TheImpaller (3,65)
2.bostonfred (7,50)
3.Duke (2,26)
4.TeddyFSB (5,40)
5.Ruthless (14,60)

VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 11:19:31
bostonfred Level 7 Report	TrueSkill: .... a solution against stalling lost games. my vote

VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 12:58:35

TeddyFSB

Level 60
Report

My apologies, I made a mistake for TrueSkill ratings. I forgot that the errors given by the BayesELO program are actually 2*sigma and not sigma, so I recalculated ratings as if n was equal to 6.

In fact, mu-3*sigma actually gives reasonable results, with Doushibag only dropping to 2nd place, which is exactly where he should be :) Shogun drops to 16th.

If you want to play with numbers without running the program, 1 sigma is approximately equal to 400/sqrt(Ngames). So penalty would be ~300 points at 16 games, 150 points at 64 games, and so on.

VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 13:14:49

Math Wolf

Level 64
Report

I'm a fan of BayesELO, but (as mentioned several times before) with a continuously decreasing function added. I do strongly suggest this change. If the code is adaptable, this change can be easily made.

To have a stabilisation for a small number of games played and to get already a ranking after few games, a number of fictional games against ELO 1500 can be added (as I've mentioned before too).
If you want a real penalty for a smaller number of games, similar to true skill, you can add games against a fictional ELO 1000 or even lower for example.

VOTE: Should the ladders switch to a traditional ELO model?: 3/20/2011 18:03:47
Polaris Level 55 Report	I'd personally prefer traditional. I'd like to be rewarded or penalized for my wins/losses, and have it remain at that. No continuous effects~

VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 04:57:48

Eitz

Level 11
Report

I like the current system for the fact that it represents overall skill as a whole and encourages you to play at a consistent level. I also like the fact that it only keeps 3 months of data as it's common knowledge that the more ladder games people take part in, the more the level of play is going to increase and it's nice not having to be penalized for a potential rough start while feeling out the ladder system early on.

VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 14:57:20
chas Level 43 Report	As previously stated in other posts, I'm with MathWolf. I really like the BayesELO, but would like to see a continuously decreasing function added.

VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 16:02:17

crafty35a

Level 3
Report

Since people don't seem to mind the quirks of the Bayesian approach (ratings that chance even when you are not playing, strange looking results with few games completed), I think the obvious solution is to replace Bayeselo with a similar system that is tailored to players of time-variable strength. This would remove the need to limit rated games to the last three months only, while still not penalizing players who have improved their play.

There are two systems that I think would work best:

Whole-History Rating:

- This one has been discussed before. It is similar to Bayeselo (actually created by the same person), but is designed for human players with varying levels of strength, which makes it a perfect choice. The only downside? There is no readily available tool to calculate the ratings, so we would need to either create one ourselves, or acquire one from someone who has already done the work. The Arimaa (a chess-like game) community currently uses WHR to calculate its ratings. In the thread discussing the rating system, the user who created the tool has declined to release his software as open source, but perhaps we could contact them and ask if they would be willing to share a tool that we could use? http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=talk;action=display;num=1207699394;start=105

- I have also been in contact with someone who is working on a java implementation of WHR for use in a chess rating competition ( http://www.kaggle.com/ChessRatings2 ). He has indicated a real interest in Warlight and a willingness to share his tool when the competition is over. I will invite him to contribute to this thread if he so desires.

TrueSkill Through Time:

- TrueSkill has been briefly discussed. TrueSkill Through Time is, in my opinion a big improvement. Results would likely be quite similar to WHR. One big plus? The source code has been freely released by Microsoft ( http://blogs.technet.com/b/apg/archive/2008/04/05/trueskill-through-time.aspx )! The downside? The released code will only compile on older versions of F#, which I do not have access to. If anyone does, or if anyone is an F# programmer, it probably would not be too difficult to make the necessary changes to get it to compile on a current version of F#.

Ultimately, I think either of these systems would be preferable to trying to modify Bayeselo with a continuously decreasing function.

VOTE: Should the ladders switch to a traditional ELO model?: 3/21/2011 16:21:00

The Impaller

Level 9
Report

I like trying out the TrueSkill option. It seems designed for a very similar environment as our own.

I'm not a fan of the current system without changes or improvements. I like it in theory, but in practicality I'm not sure it's the best for the ladder. The current system is almost such that you don't work your way to the top with wins, but rather with avoiding losses, which encourages slower play and loss dodging. With a more standard ELO system (and True Skill seems closer to standard ELO than Bayesian) you work your way to the top by winning a lot. That could encourage people to accept losses and surrender them rather than drag them out because it takes up a game slot that could be used to attempt to achieve a victory.

My vote goes to TrueSkill Through Time.

VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 02:34:43

Eitz

Level 11
Report

well granted I haven't seen that a lot cuz I'm like Duke in the fact that when I know a game's over and I'm gonna lose, I prefer to tear off the band-aid and move on to the next game asap and the games I've had on here have generally been with pretty respectful people who tend to be cut from a similar cloth. I could definitely see that being an incredible nuisance to have to wait for players who are dodging the loss to save their rating as it's just going to happen sooner or later anyways. If this TrueSkill puts more ownice on winning rather than 'not losing' then I would for sure change my vote to something to that effect.

VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 02:36:32
Eitz Level 11 Report	I just also still really like the idea of only keeping a 3 month track record (but I may be asking too much now ;P)

VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 07:18:27

Blue Precision

Level 32
Report

I voted to keep the system the way it is and here's the main reasons why:

1) I like the fact that your score fluctuates based on how the opponents you've already played perform. This places less emphasis on players early games. In other words, if Ruthless or a player of his similar skill level played poorly out of the gate, say losing 6 of his first 10 matches, this is just coincidental and the standing should reflect this. Yet with True Skill if I was the person who beat him for the 6th time I would get relatively little points even though Ruthless is an exceptional player. The current system recognizes that Ruthless is an exceptional player and just happened to lose more games early then his skill level would eventually indicate. Therefore, why penalize players on the timing of their matches with other players? The ranking already reset and are reflective of 3-month blocks of time so you can't use the argument that players can improve drastically over time.

2) I think the ladder, give or take a few spots in positions, is extremely accurate in its list of players' skill to this point. Not too offend anybody lower down but I truly believe this statement. And just like in sport standings, this system makes it a tough grind to leap frog opponents. This is a great thing in my opinion. If ladder ranking are supposed to give an accurate pecking order then it should be more static then dynamic. The system would be flawed if players jumped around sporadically from day to day based on a few game winning/losing skids. To use myself as an example, it has taken me nearly two weeks of consistently playing well to supplant Ruthless and Troll who held the 12th and 11th spots (I was 13th). I don't see that as a problem but rather as a praiseworthy element of the current statistical mechanism.

3) A lot of what I'm reading is players wanting to use a system where a 5 game winning streak shoots them up the standings. I think this is rubbish. Once a decent sample size has been established by all players participating in the ladder (I have played well over 50 games now) the swings should be slow; people should pass each other at a snails pace if the system is going to be accurate. From what I can tell, that exactly what is happening.

In my opinion the ladder is extremely enjoyable, has pushed people to get better, and it is very accurate. Let's not change it for the sake of change.

VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 11:39:00

TeddyFSB

Level 60
Report

I am undecided on what I prefer, but I think I would like the system tweaked so that some reasonable number of games is needed to get into top 10. Right now, getting into top 10 is too easy for an above average player -- you don't even have to employ stalling although it can help greatly. You just need a lucky streak to begin with, and all of a sudden there is a new unproven player in top 10, who can stay there a while if they don't play much. This will happen with regularity given the current system.

I don't like the top of the ladder diluted this way, so I don't think top 10 should be reachable before playing 30-40 games. As an added benefit, there will be much less stalling then.

I am leaning towards a penalty of ~1000/sqrt(Ngames) applied to all current scores since it is trivial to implement and works sufficiently well.

VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 13:09:33
crafty35a Level 3 Report	The problem with applying an arbitrary penalty to scores on top of the current system is that we would be sacrificing the accuracy of the ratings. If we think it is a big problem to have people in the top 10 with only 10 games completed, then I think the better solution is to increase the length of the provisional period.

VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 13:33:49

Gaia

Level 25
Report

Now that there’s a restriction on who players can have games with (within 20% of your rating), I’m not so sure we’ll see meteoric rises to the top as before.

I voted to keep the current Bayesian ELO system. I definitely prefer bayeselo over the traditional ELO model because it’s a more accurate representation of skill level, which utilizes more relevant data (the wins/ losses of your previous opponents over the last 3 months), resulting in the dynamism of ratings which constantly reflect this. In a few weeks the stalled games will be complete and we’ll see even more refined ladder rankings. I would like to see this system play out for at least a while longer. Patience! :)

I would be open to modification of the current Bayesian model after the 3 month mark, but have yet to see exactly which modifications would be better suited to the WarLight ladder. The decayed-history system seems to benefit those who play less games (which will be me once I join the ladder), while penalizing those who play more frequently. Alternatively, I’m strongly opposed to the TrueSkill system which would penalize decent players who are not able to play a large amount of games in a shorter time frame. The Whole-History Rating system sounds interesting, but how exactly does it take into account the varying levels of players? I read an abstract which indicates this system to be more accurate in predictions than decayed-history and TrueSkill algorithms, but not sure if it can or should be applied here.

VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 13:40:31

fatguyinalittlecoat

Level 3
Report

I haven't voted, and I think the current ladder does a pretty good job in terms of accuracy.

My only issue with the ladder as currently constituted is the "Doushi" problem, where players deliberately slow games to a crawl to avoid taking a loss. I'm not talking about someone taking a day to think over a move or playing a few moves past where he should probably surrender. Those things are completely understandable and not a big deal. I'm talking about waiting until just before the boot timer is finished, then making one move (despite the fact that he could make two), then waiting another 3 days to do the same thing. I have a game against Doushibag where he's doing this. I have another game against spikeknights where I suspect he may be doing it as well. I would support any change in the system that creates a disincentive for this strategy.

VOTE: Should the ladders switch to a traditional ELO model?: 3/22/2011 15:50:49
TeddyFSB Level 60 Report	Crafty, 1000/sqrt(Ngames) penalty is not arbitrary, this is roughly what TrueSkill option uses. Uncertainty of your rating is ~ 400/sqrt(Ngames), so if after applying the penalty your score becomes 1800, what this means is that, with 99% probability, your skill is at least 1800. Or we can use 650/sqrt(Ngames), then it will be 95% probability.

Posts 1 - 20 of 43 1 2 3 Next >>

Post a reply to this thread