Checking if Elo system is oppressive [With proofs] - Chess Forums

basketstorm

Sep 11, 2024

0

#1

UPD to whoever is reading this: start reading from the last page and go back as I reveal proofs that are more substantial and easier to understand there. Below simulation was just a topic-starter. You'll find more revelation on further pages.

Using ChatGPT powers I simulated 1000000 chess games in a pool of 1000 players. Pairing was rating based with small diffusion to emulate online presence factor. Win/loss factor - just like prescribed by Elo. All players had hidden strength in Elo: 90% of players - from 1000 to 1400, 10% players - from 1400 to 2800. Initial rating was 200, rating floor - 100.

Graphs:

Blue: initial strength distribution.

Green: rating after simulation show that the largest group is minimal-Elo players. Mid-Elo group received artificial bump despite the fact that strength of players was constant during simulation!

Full table with data for each player (names are all fake based on names of real great players and names repeat but that doesn't matter because each player has unique id):
https://pastebin.com/raw/JqGKun3K

Conclusion:

best of the best climbed to the top easily.
Low elo players unfairly end up in a various rating ranges, apparently because of luck, not because of lack of skill. And now you can't blame virtual players for lack of skill. Because game result was dictated by their actual hidden strength.

So in the end we have cases like:

 id player_name hidden_strength_Elo final_rating_Elo

176 Magnus Portisch 1097 509

468 Vladimir Svidler 1263 497

571 Sergey Short 1239 1042

That means actual strength could be 1200, but rating could be 500 OR 1000.

Or look at this oppressed guy:

 id player_name hidden_strength_Elo final_rating_Elo

467 Boris Nepomniachtchi 1203 355
With strength 1203, his rating is 355.
Each player here played 1000 games!
Some more oppression:

 id player_name hidden_strength_Elo final_rating_Elo

266 Hikaru Gajdosko 1112 100

322 Magnus Capablanca 1003 132

Magnus is weaker than Gajdosko but Gajdosko is stuck at 100. Is this fair?

This all aligns with my observations and experience here on chess.com and explains why many people astonished by randomness in apparent strength of their opponents that have same rating.

Thoughts?

xtreme2020

Sep 11, 2024

0

#2

There’s no way chatGPT can simulate something like this, it can’t even answer the simplest problems I’ve ever seen

basketstorm

Sep 11, 2024

0

#3

xtreme2020 wrote:

There’s no way chatGPT can simulate something like this, it can’t even answer the simplest problems I’ve ever seen

ChatGPT is evolving. Free version can't do much, sure, but most advanced runs analysis, you need to wait for a while and it returns with graphs and tables

MasterJyanM

Sep 11, 2024

0

#4

intresting.quite intresting.

basketstorm

Sep 11, 2024

0

#5

For comparison, current global rapid leaderboard on chess.com:

Peak is at 400. But I guess that's because most people don't play much Rapid and their initial rating (400) remains around 400. Those who play mostly sit in 100-300 category. And after the peak we see same decreasing slope I had with my green graph. Occasionally those 400s step into a fight. Their actual strength is very random. But low-Elo player get punished like if that opponent's 400 rating was real!

xtreme2020

Sep 11, 2024

0

#6

#4 this simulation would realistically need a really powerful computer and a custom written program, nothing like chatGPT can do this. ChatGPT still gives you a random number when you ask it the last digits of pi, and contradicts itself all the time. It’s evolving yes, but it still isn’t very good, and not nearly this good.

basketstorm

Sep 11, 2024

0

#7

xtreme2020 wrote:

#4 this simulation would realistically need a really powerful computer and a custom written program, nothing like chatGPT can do this. ChatGPT still gives you a random number when you ask it the last digits of pi, and contradicts itself all the time. It’s evolving yes, but it still isn’t very good, and not nearly this good.

The data is not coming from the language model itself. It's a result of actual program execution. So it is indeed a "custom written program" in this case.

And I think, you overestimate the complexity of this task.

xtreme2020

Sep 11, 2024

0

#8

By the very definition of elo, if someone is actually 1200 strength and playing 500s they will win every single game no matter how many games they played, never mind losing enough to stay consistently at 500. What you don’t understand is that elo isn’t just some random rigged number, it’s the definition of skill. There is no “hidden elo strength” because the elo you are at is always the elo you play at.

xtreme2020

Sep 11, 2024

0

#9

#9 well it did something wrong, because it’s mathematically impossible for a 1263 to ever lose to a 497, never mind losing enough to consistently stay at that level.

xtreme2020

Sep 11, 2024

0

#10

#11 maybe .1% of the time, I don’t know the exact number, but 50% of the time?

basketstorm

Sep 11, 2024

0

#11

xtreme2020 wrote:

By the very definition of elo, if someone is actually 1200 strength and playing 500s they will win every single game no matter how many games they played, never mind losing enough to stay consistently at 500. What you don’t understand is that elo isn’t just some random rigged number, it’s the definition of skill. There is no “hidden elo strength” because the elo you are at is always the elo you play at.

500 vs 1200 - 1.75% win chance for 500.

I understand what Elo is perfectly.
Hidden Elo strength in this case would be a comparison to engine's (like Stockfish) Elo. If player wins 50% games against Stockfish in 1400 Elo setting, that's what I call hidden Elo strength of that player.

basketstorm

Sep 11, 2024

0

#12

xtreme2020 wrote:

#9 well it did something wrong, because it’s mathematically impossible for a 1263 to ever lose to a 497, never mind losing enough to consistently stay at that level.

1263 and 497? You misunderstood my table. 1263 in that table is hidden Elo strength, 497 - resulting Elo after simulation. At the same time a different player in simulation has 1239 hidden strength (close to previous player's strength) but final Elo is 1042. That's the problem: players are rated identically but have wildly different actual strength.

xtreme2020

Sep 11, 2024

0

#13

#13 not if you’re also 700. And, you losing to 1400s is the result of you being distracted, or having a bad day probably. This program assumes neither. But even if you lose a decent amount of games to 1400s, would you say any 2300 could ever lose 50% of their games over a long time span to a 1400?No, the score would probably be around 100/1 on a bad day.

xtreme2020

Sep 11, 2024

0

#14

#15 well, the stockfish elo setting is inaccurate. There is your problem.

basketstorm

Sep 11, 2024

0

#15

xtreme2020 wrote:

#13 not if you’re also 700. And, you losing to 1400s is the result of you being distracted, or having a bad day probably. This program assumes neither. But even if you lose a decent amount of games to 1400s, would you say any 2300 could ever lose 50% of their games over a long time span to a 1400?No, the score would probably be around 100/1 on a bad day.

No one said about losing 50% of games with large Elo difference.

xtreme2020

Sep 11, 2024

0

#16

#18 but would you say it’s enough to significantly decrease your elo after playing for a while?

basketstorm

Sep 11, 2024

0

#17

xtreme2020 wrote:

#15 well, the stockfish elo setting is inaccurate. There is your problem.

Why do you think so? It's actually precise to the very last digit and easy to prove by running stockfish 1400 against stockfish 1701 multiple times - 15% winrate in the long run. Easy to calibrate. And it doesn't matter, that's just a way to describe that each player has certain strength even if player is unrated.

xtreme2020

Sep 11, 2024

0

#18

#20 a 50% win rate is needed for a stave rating, actually slightly less counting in draws. Two of your players had a “hidden elo” of around 1250, but a real elo of 500 and 1000. That means ChatGPT calculated something wrong, as assuming your “hidden elo” is a better calculation of the real strength, the 500 player would have had to lose 50% of his games against players that a player of similar “hidden elo” would almost never lose to. This is impossible. Therefore, the hidden elo is less accurate than the real elo.

xtreme2020

Sep 11, 2024

0

#19

*stable rating

xtreme2020

Sep 11, 2024

0

#20

Yeah I think cause it’s estimated that you’ll win literally 100% of the games at that point