Warzone

<< Back to Warzone Classic Forum

Posts 121 - 140 of 924 <<Prev 1 2 3 4 ... 6 7 8 ... 27 ... 46 47 Next >>

Multi-day ladder: 11/13/2016 10:53:20

krunx

Level 63
Report

Activity/elo-system:

Personally I would recommend to adjust the K-value dependent on activity like Memele mentioned. But I would not do it linear, as then there would be no appeal to play 9 games in parallel instead of 1. In addition to that set a minimum of games per month to show up in rankings.
In theory the elo-value of a player is more accurate.

In praxis you are right, inflation by activity drifts players towards playing more games. This is a decision one has to make: having the most exact rating system showing skill or having more games. This is a general decision the creator of the ranking system faces. And it is a decision that will influence the behave of the players of the multi-day ladder fundamentally. Through my eyes the ranking system is one of the bigest set screws of the developer of a game/ladder.

I do not have a deep knowledge of the history of the warlight ladder, but through my eyes the current rating system of the ladder is one major reason why people do "ladder runs" and then leave the ladder.

start-value/ladder-runs:
Thorugh my eyes not starting with a fixed value has advantages, but one has to ensure that noone uses alts. I do not know how complex this problem (api data) is in this case.

Multi-day ladder: 11/13/2016 12:28:30

Memele

Level 60
Report

Once you have 100 games, no matter what you do, your rating is very stable and players feel like they're not making progress even though they are winning.

If you use a good correction, this should no happen. For example, with the one I said earlier a VERY high activity means k=15, that's still high enough. For comparison, FIDE uses 20 for almost all people...
The only problem would be having much more elo than your rivals, but that's not something that can't be solved easily it happens in chess (Carlsen has problems to reach 2900 because of lacking of +2800 rivals).

On the other hand, you can get an inflated rating by playing just 20 games.

If the initial K is low, you could inflate it a bit, but nothing too exagerated.

One potential fix to the long running time, is to update the previously computed Elo using the recent games, but this has other drawbacks. If there is a bug in the system and I need to make a game disappear, it is hard to do as the previous rating will need correction. On the other hand, running a fresh Elo computation every time, does not have this problem.

This can be solved modifying it manually if an error appears but that's most work for you ^^U
The erros are mostly people not joining, righ? If that's the case, there is some way to make elo variation being 0 in those games?

Multi-day ladder: 11/14/2016 05:17:21

Deadman

Level 64
Report

@krunx

Personally I would recommend to adjust the K-value dependent on activity like Memele mentioned. But I would not do it linear, as then there would be no appeal to play 9 games in parallel instead of 1.

Yes. This is my main worry. The proposals so far encourage players to play 1 at a time as opposed to 9.

In addition to that set a minimum of games per month to show up in rankings.

This is a bit tricky. While I agree with your sentiment, I need to be careful that I don't disqualify too many players from attaining a rank as that discourages participation too.

This is a decision one has to make: having the most exact rating system showing skill or having more games. This is a general decision the creator of the ranking system faces. And it is a decision that will influence the behave of the players of the multi-day ladder fundamentally.

Completely agree. While we strive for accurate ratings, if it comes at the cost of reduced activity, that is not a good thing. An active player pool ensures variety in match-ups and makes the CLOT much more attractive. Activity also allows us to estimate "true Elo" more accurately. But in the worst case, I'd rather sacrifice some accuracy if it is at the cost of activity.

@Memele

If you use a good correction, this should no happen. For example, with the one I said earlier a VERY high activity means k=15, that's still high enough. For comparison, FIDE uses 20 for almost all people...

I haven't worked out the math yet. But my gut tells me that in the system proposed, it is better to go 18-2 than 90-10. Would you agree with this? On the same note, what is the objection to an artificial rating boost(just for ranking and display)? This boost will not apply to the matchmaking system or the Elo calculations. It is just an incentive to play more. It seems to work really well for the seasonal ladder where everyone ensures that they play the minimum amount of games necessary. If they do not play N games, they fall significantly behind. However, if everyone plays this number of games, there is no relative gain and we fall back to relying on true Elo to judge players.

This can be solved modifying it manually if an error appears but that's most work for you ^^U
The erros are mostly people not joining, righ? If that's the case, there is some way to make elo variation being 0 in those games?

Yeah. It can be solved manually but is a big pain to do on a large scale. The errors can be due to bugs in the system, which happens every now and then. This design is based solely on the ability to run the CLOT with minimal human intervention. However, if it leads to a broken system due to poor scalability, I will consider the alternative.

p.s - Thanks for the continued discussion. It is very helpful and will lead to better solutions.

Multi-day ladder: 11/14/2016 17:01:29

Memele

Level 60
Report

For calculating the elo, I understand that using all games it's done to avoid bugs and so. Maybe a solution to avoid this without lots of time in calculations and the need to expire games could be:
We set a period of time, let's say 3 months (but it could be different). You have a variable, initial elo (=1500) and create an auxiliar variable, elo-1. After 1 months we save the players elo to elo-1. We do the same after 2-3 months (elo-2 and elo-3). For now 3 month passed (the time period we decided before).
One month after that we do:
initial elo = elo-1
elo-1 = elo-2
elo 2 = elo-3
elo-3 = new elo this month
After this doesn't matter if games expire, the elo value those old games had it's taken into account because of the change in the initial elo and you don't need more than 3-months-old games for the calculations.

The only problem to this is if a bug affect games more than a month old. I guessed that bugs happen to more recent games, but maybe I'm wrong. Depending on this the system cold be changed, but I hope that my idea it's understandable.

Multi-day ladder: 11/14/2016 17:37:17

Ollie

Level 62
Report

the downside of letting games expire is that it will encourage people making runs. On the other hand it will also be a motivation for people who have improved their skill. The reason why the RT ladder has so many problems with runs on alts is that it is nearly impossible to get a good ranking when you started with a bad start. So people who are new to the ladders/strategy games could get discouraged by having a bad start and never being able to compete for the top spot ever

Multi-day ladder: 11/14/2016 17:41:49
Memele Level 60 Report	@Ollie But that's because WL ladders use another elo system, with normal elo a bad beggining it's not that big a deal, if you improve you will go up.

Multi-day ladder: 11/14/2016 23:08:54

Deadman

Level 64
Report

@Ollie, Yeah I agree with Memele. The concern you raised is only due to bayesian Elo and will not occur on this CLOT. An early loss is not a big deal imo.

@Memele,
Yep. I understood what you said :) I will implement it soon.

What about my other point? Do you dislike this? And if so, why?

But my gut tells me that in the system proposed, it is better to go 18-2 than 90-10. Would you agree with this? On the same note, what is the objection to an artificial rating boost(just for ranking and display)? This boost will not apply to the matchmaking system or the Elo calculations. It is just an incentive to play more. It seems to work really well for the seasonal ladder where everyone ensures that they play the minimum amount of games necessary. If they do not play N games, they fall significantly behind. However, if everyone plays this number of games, there is no relative gain and we fall back to relying on true Elo to judge players.

Multi-day ladder: 11/14/2016 23:42:06
Ollie Level 62 Report	ahh i didn't know that. I never worked with the elo rating systems. Forget about what i said :)

Multi-day ladder: 11/15/2016 08:37:38

Memele

Level 60
Report

What about my other point? Do you dislike this? And if so, why?
The more the games the more accurate the elo, so 18-2 would be better than 90-10 depending of how reliable the initial calculation is. That's why I propose a lower K for the beggining, to avoid this impact. This shouldn't discourage players because when they take the normal K, they should be able to adjust their elo (if not close enough to their true level) relatively fast. In fact, there is no really that big of a deal with the artificial initial boost but the "runs". For a normal players who doesn't care about that, after the initial elo, they would reach their true elo eventually, being it lower or higher to the initial one after the 20-games. The only problem is that if it's a system than let you take a high elo "easy" it will encourage runs.
For the variable K, there is no need to change it a lot, or maybe to change it at all (after the initial games). In the chess example I used the elo is calculated per torunaments and not games so the impact it's bigger, maybe, going game by game it's good enough like it is, like in online chess. But, as I said before, it could still be good for the initial ranking.

Multi-day ladder: 11/15/2016 23:51:55

krunx

Level 63
Report

What about my other point? Do you dislike this? And if so, why?

The more the games the more accurate the elo, so 18-2 would be better than 90-10 depending of how reliable the initial calculation is.

Through my eyes the K-factor only depends on the number of games one is able to make in a period of time. So I would compare the number of games per player per month in warlight with the number of games in chess and intialize the K-factor to that value.

There are also some psychological factors:
1st: How much luck is involved in a warlight game? It is frustrating to loose a game and loose a lot of points only because you had bad luck. Just imagine the following situation:
A, elo 2000; B, elo 1900 => A wins 2 out of 3 games (if I am right); But lets imagine in the game there will appear (with a very high probability) a situation where winning chances are 50%/50%; Than A will loose most probably loose elo against B.
2nd: Someone who plays activly, usally wants to see his elo increasing. But sometimes there are some barriers and he wont do it that fast. This is sometimes very demotivating.

I am not sure how, but I would definetly try to push activity activly and keep players motivated. Maybe a small inflation of the system is wanted as it will push activity.

One also has to think about the time it takes to balance the elo-system in the ladder. Wouldnt a higher K-factor lead to a faster balancing (I know it wouldnt be that accurate at the start, but at least one would see different elo-numbers)?

Edited 11/15/2016 23:52:59

Multi-day ladder: 11/17/2016 15:59:14
J_Dog33340 Level 58 Report	so why do the amount of games matter in an Elo system? i googled what Elo was and i didn't see anything that had to do with it

Multi-day ladder: 11/17/2016 20:16:32

Memele

Level 60
Report

so why do the amount of games matter in an Elo system? i googled what Elo was and i didn't see anything that had to do with it
I will put an easy example with aproximate numbers, not using the real fórmula:

Player A has 1700 elo now, but with a "real strenght" of 1800. If he plays against a 1700 he wins or loses 16 points (K=32) but he should win more than losing because he "is" a 1800. Let's see what happens if he go 1 game at a time:
A vs 1700, wins --> +16 --> 1716 elo
A vs 1700, wins --> (now he get a bit less, let's say 15 points) --> 1731 elo
A vs 1700, wins --> +13 --> 1744

Now let's see what would have happened if he played 3 games at the same time at the start:
A vs 1700, 3 wins, each one 16 points --> 1748

The difference it's only 4 points, but with more games and elo diferences it's a bit more. As I said with "game to game" calculations it's not so big a deal, that's why I focused more in the initial elo calculation ;)

Edited 11/17/2016 20:16:57

Multi-day ladder: 11/17/2016 21:01:08

Math Wolf

Level 64
Report

From what I read, a solution to much of the discussion above could be achieved by not sorting/ranking on the calculated (mean) rating, but to correct for activity (and accuracy) by subtracting the standard deviation (see TrueSkill, that subtracts 3 times the standard deviation, which is a more harsh version of this).

This is a conservative ranking that gives advantage to players whose rating is more accurate, often by playing more.

I don't see why you would let games expire. Elo was originally made for non-expiring games, I believe and all non-expiring updating rating systems simple do work better than those that use expiring games.

Multi-day ladder: 11/17/2016 22:58:08
Sułtan Kosmitów Level 64 Report	Very much agree about expiration with MW.

Multi-day ladder: 11/17/2016 23:26:20
Memele Level 60 Report	I don't see why you would let games expire. Elo was originally made for non-expiring games, I believe and all non-expiring updating rating systems simple do work better than those that use expiring games. For shortening time calculations. He explained it above and that's one of our discussions :) Edited 11/17/2016 23:26:28

Multi-day ladder: 11/17/2016 23:39:56

player12345
Level 61
Report

I don't see why you would let games expire. Elo was originally made for non-expiring games

ELO was originally used for chess, which is a single game with no parameters. This MD ladder is a grab-bag of 35+ templates which may change over time.

We can expect that some players will have a higher likelihood of winning on certain templates. So in that regard, ELO might be poor choice. Expiring games doesn't improve on that, but helps ensure ratings stay relevant as the template grab-bag changes.

Tracking an ELO for each template and then somehow combining those ratings might be interesting.

Multi-day ladder: 11/18/2016 06:31:07

krunx

Level 63
Report

Tracking an ELO for each template and then somehow combining those ratings might be interesting.

Runs into the problem, that you can blocking templates and the number of games on each template may be very different and therefore the elo may not be that comparable.

Furthermore, how do you combine these ratings. Also I think it isnt that intuitiv. How do you match players then? Right now you match them by elo and then choose the template.

Edited 11/18/2016 06:31:21

Multi-day ladder: 11/18/2016 14:47:36

J_Dog33340

Level 58
Report

Player A has 1700 elo now, but with a "real strenght" of 1800. If he plays against a 1700 he wins or loses 16 points (K=32) but he should win more than losing because he "is" a 1800. Let's see what happens if he go 1 game at a time:
A vs 1700, wins --> +16 --> 1716 elo
A vs 1700, wins --> (now he get a bit less, let's say 15 points) --> 1731 elo
A vs 1700, wins --> +13 --> 1744

Now let's see what would have happened if he played 3 games at the same time at the start:
A vs 1700, 3 wins, each one 16 points --> 1748

The difference it's only 4 points, but with more games and elo diferences it's a bit more. As I said with "game to game" calculations it's not so big a deal, that's why I focused more in the initial elo calculation ;)

wouldn't that be the reason for using Elo? i was asking about why the amount of games matter

Edited 11/18/2016 14:47:58

Multi-day ladder: 11/18/2016 20:52:43

player12345
Level 61
Report

Runs into the problem, that you can blocking templates and the number of games on each template may be very different

Yes, choosing/blocking templates further distorts the meaning of a single ELO rating.

However, blocking is a nice feature that makes the competition more fun. It allows players to specialize and avoid templates they don't like or are not good at. Actually, template blocking seems to increase the need for multiple ELOs and a composite metric.

Furthermore, how do you combine these ratings. Also I think it isnt that intuitiv.

Here's an attempt:

Consider a system with 1 elo per template, and you get points based on your ranking from each elo system. Say you are ranked 3 for Guiroma. One way is for points to be given by f(k) = 1/k or 1/3 points. If more people join the ladder, your points won't change.

If one person had rank 1 in all 35 templates, they would have 35 points. If somebody was ranked last in all templates, they would have just above zero points.

How do you match players then? Right now you match them by elo and then choose the template.

You would match based on points, and randomly choose the template. Then the outcome of that game effects only the elo for that template.

Edited 11/19/2016 01:35:37

Multi-day ladder: 11/18/2016 21:21:03
player12345 Level 61 Report	The above point system is almost certainly not practical as is, but maybe it could be tweaked.

Posts 121 - 140 of 924 <<Prev 1 2 3 4 ... 6 7 8 ... 27 ... 46 47 Next >>

Post a reply to this thread