<< Back to Ladder Forum   Search

Posts 1 - 9 of 9   
More interesting 1v1 Ladder Visuals: 2/15/2019 23:38:23

Nick 
Level 58
Report
Hi everyone, it seems my last post (some visuals on stalling in the 1v1 ladder) generated a lot of community interest. I mentioned I have a few blog posts upcoming, and I figured I would offer a few more interesting tidbits from those posts as a bit of a teaser.

Without further ado, I present you with visuals on how well Warzone's Elo rating system actually predicts match outcomes. It's actually far better than I expected.

A much deeper analysis will be presented in the blog posts, but for this post I will simply open with a few facts.

1) One of the key functions that ELO bases itself around is that ratings are calculated to represent the relative win probabilities of two players when they face off.

2) These visuals Take a simple approach to measuring the rating system's accuracy. It groups predicted win probabilties for each game into small bins (in intervals of 1%). Think of this as it measures the prediction accuracy of Warzone's ratings separately for games where the projected win probabilities of the superior player are in the interval (50,51%] (and hence playing someone with a projected win probability of [49-50%) then repeating the same for superior player interval (51,52%] vs [48-49%) and so on. Then it looks to see if the win rates of those superior players line up with what they should be (i.e. in games where the superior player is predicted to win 84% of the time, is their win rate 84%?). I first provide a plot of exactly this. Then, I provide a related plot showing the errors for each of those intervals (the difference between the actual and predicted win rates). Smaller binning will be examined further in the blog posts.

3) This analysis was conducted on all 1v1 ladder Warzone games where both players had ratings this amounts to 140,635 1v1 ladder games. If you are curious about separating these based on when Warzone switched how the expiration window on games, see my upcoming blog post :)

4) I am also providing a histogram of the counts of games won by players with each predicted win probability, it's not super informative beyond noticing that its mean is larger than .5 and that Warzone's 1v1 ladder does a pretty good job of assigning games that will be as even as possible, but it can be nice to look at.

I am going to leave the analysis out of this (save it for the blog) other than saying that those error values are way lower than I think many would expect, Warzone's ELO ratings actually do a really good job. But please do offer up your own takes.

Best,

Nick

https://imgur.com/gallery/tE7gbJz







Edited 2/16/2019 04:53:39
More interesting 1v1 Ladder Visuals: 2/15/2019 23:39:31

Nick 
Level 58
Report
Also, if someone could please let me know how to get these images posted in a format that will fit the Warzone forums correctly that would be awesome, thanks!
More interesting 1v1 Ladder Visuals: 2/16/2019 01:42:11


DanWL 
Level 63
Report
I think each forum post is around 350px wide, so try not to go over that.
More interesting 1v1 Ladder Visuals: 2/16/2019 17:37:59


chriger
Level 61
Report
Does this analysis include boot wins before territory selection?

Looking at the last 20 ladder games, 3 were boot wins before picking territories (15%).

Not sure if that's similar to a larger group of games, but seems it'd skew the statistics if there's enough boot wins.
More interesting 1v1 Ladder Visuals: 2/17/2019 17:18:20

Nick 
Level 58
Report
@chriger

Does this analysis include boot wins before territory selection?

Looking at the last 20 ladder games, 3 were boot wins before picking territories (15%).

Not sure if that's similar to a larger group of games, but seems it'd skew the statistics if there's enough boot wins.


Yes it does include those games. Your question is a very good one, and one that I considered several times. However, I decided to keep games ending in boots in the sample and here is why.

Consider the fact that your rating is altered the same way by a win or a loss via boot as it is by a win or loss via surender or other more gameplay-driven methods. Hence, the rating is implicitly including the probability each player gets booted into their rating. The rating is a function of likeliness to win, not simply superior player. Players who get booted once are often more likely to be booted again (obviously not true for everyone) so this should be kept in evaluating the rating accuracy.

I really do appreciate the question!
More interesting 1v1 Ladder Visuals: 2/18/2019 13:31:22


Tristan 
Level 58
Report
Some interesting stuff there.
I've resized the images so they don't devour our screens. ;)



More interesting 1v1 Ladder Visuals: 2/18/2019 14:54:01


Beep Beep I'm A Jeep 
Level 64
Report
Interesting, thank you!
Very unexpected result.
More interesting 1v1 Ladder Visuals: 2/18/2019 15:13:13

Nick 
Level 58
Report
@Tristan Thanks!
More interesting 1v1 Ladder Visuals: 2/18/2019 15:27:22


Tristan 
Level 58
Report
No problem.
You can resize the pictures when you upload them, I use https://postimages.org/. 640x480 looks fine.
Posts 1 - 9 of 9