A 3000 could easily beat a 2000, but could a 4000 easily beat a 3000?

Sort:
EndgameEnthusiast2357

Well, the people I draw against will get higher, lessening the difference. Both rating changes need to be taken into account. With tens thousands of rated players, and millions of games, things can get complicated!

llama

Yeah, that's why for ratings to be as accurate as possible people's opponents should be random and you shouldn't play rematches.

A common way for people's peak rating to be higher than their usual rating is playing someone with a rating close to theirs that they happen to match well against (due to style, or the other person is sick, or some other reason).

Then when they play random opponents again, their rating falls back to where it should be.

EndgameEnthusiast2357
Telestu wrote:

Yeah, that's why for ratings to be as accurate as possible people's opponents should be random and you shouldn't play rematches.

A common way for people's peak rating to be higher than their usual rating is playing someone with a rating close to theirs that they happen to match well against (due to style, or the other person is sick, or some other reason).

Then when they play random opponents again, their rating falls back to where it should be.

I agree, and I also think it's weird how computers are rated the same way humans are. A computer can calculate millions of more positions than humans, so why are they still in the 3000-4000 scale? They should be in their own category in the trillions!

Elroch
baddejimme wrote:
EndgameStudier wrote:
Telestu wrote:

Yeah, that's why for ratings to be as accurate as possible people's opponents should be random and you shouldn't play rematches.

A common way for people's peak rating to be higher than their usual rating is playing someone with a rating close to theirs that they happen to match well against (due to style, or the other person is sick, or some other reason).

Then when they play random opponents again, their rating falls back to where it should be.

I agree, and I also think it's weird how computers are rated the same way humans are. A computer can calculate millions of more positions than humans, so why are they still in the 3000-4000 scale? They should be in their own category in the trillions!

Engine ratings (CCRL) and FIDE ratings are entirely different things. Engines are rated by playing against other engines, with each engine getting the same hardware, usually on the same machine. Ratings don't go up all that fast because new hardware doesn't make it easier for engines to win against other engines.

True. Engine+hardware rating must have gone up much faster than engines with fixed hardware. Indeed, it would be possible to make a large advance by taking an existing chess engine and efficiently adapting it to a very powerful hardware platform (which would involve greater parallel computation).

That being said, engine with restricted hardware should be roughly compatible with human scales if traced back to the origins and adjusted for any change in hardware specification,, because the original ratings given to engines WERE based on human-computer games.

One thing that skews the scale is if a decision is made to change the rules for hardware, and not to give all the engines the boost in rating associated with the performance boost they will receive.

It's easy to get the impression that Stockfish(3443) is 600 points higher than Magnus Carlsen(2843) but those numbers have nothing to do with each other.

They have something to do with each other, and might not be far off accurate.

If true, it would mean Magnus would only draw 1 in 16 games which sounds awful, but I know of no-one who gets any draws against unhandicapped Stockfish 9 with normal rules.

 

SmyslovFan

A 600 point gap suggest that the stronger player would win 97% of the game points.

I think Carlsen could still get more than 6 draws out of 100 games against Stockfish. 

lfPatriotGames
baddejimme wrote:
EndgameStudier wrote:
Telestu wrote:

Yeah, that's why for ratings to be as accurate as possible people's opponents should be random and you shouldn't play rematches.

A common way for people's peak rating to be higher than their usual rating is playing someone with a rating close to theirs that they happen to match well against (due to style, or the other person is sick, or some other reason).

Then when they play random opponents again, their rating falls back to where it should be.

I agree, and I also think it's weird how computers are rated the same way humans are. A computer can calculate millions of more positions than humans, so why are they still in the 3000-4000 scale? They should be in their own category in the trillions!

Engine ratings (CCRL) and FIDE ratings are entirely different things. Engines are rated by playing against other engines, with each engine getting the same hardware, usually on the same machine. Ratings don't go up all that fast because new hardware doesn't make it easier for engines to win against other engines.

 

It's easy to get the impression that Stockfish(3443) is 600 points higher than Magnus Carlsen(2843) but those numbers have nothing to do with each other.

I'm the first to admit I dont know anything about how ratings are calculated, but it seems to me there must be SOME relation between a computers rating and a persons rating. If a computer is rated at 3500 and a human is rated at 2800 are they really not 700 points apart? If the two ratings have nothing to do with each other, why have them so close and appear as if they are related?

lfPatriotGames
baddejimme wrote:
lfPatriotGames wrote:

~snip~

I'm the first to admit I dont know anything about how ratings are calculated, but it seems to me there must be SOME relation between a computers rating and a persons rating. If a computer is rated at 3500 and a human is rated at 2800 are they really not 700 points apart? If the two ratings have nothing to do with each other, why have them so close and appear as if they are related?

The thing is, you cannot say "Stockfish is equivalent to a 3443 FIDE rating" without specifying what machine Stockfish is running on. I could run it on a high end server or a Raspberry Pi.

 

Yes, the numbers do look plausible for a PC, and I believe the ratings have been loosely calibrated based on the small number of published human vs computer games, but engines and humans aren't directly comparable and Stockfish's 3443 rating is meaningless when applied to vs human games.

I dont understand any part of that. But I'll take your word for it. At first I thought if a 3443 rating is meaningless it could mean an average beginner could beat it, since its not comparable. But I think you are saying the computer is a lot better, just not in a way that its rating and a humans rating are calculated. 

SmyslovFan

To say that computer ratings are meaningless is overstating the case. Computer ratings were once tied to human ratings. But the problem arises when computers only play other computers, and play thousands of games against other computers. CCRL ratings are only distantly related to human Elo, but they are still related. 

 

Of course computers are vastly superior to humans. Of course, it's become very difficult for a human to get even a draw against a computer. But, given enough incentive, and rest between games, I believe a 2800 rated human will be able to draw at least one in four games as White against even the strongest Stockfish engine.  There are just too many openings that White can choose that end in stale equality. AlphaZero is a different beast, but the stats (72 draws out of 100 games against a non-standard Stockfish comp) suggest that a 2800 should still be able to draw at least occasionally as White. This suggests that computer ratings may be exaggerated at least a bit. 

 

If I'm right, and a 2800 rated human can score as much as 1 out of 10, then that drops the difference to less than 450 points. So instead of 3400, we're talking about a 3250 rated player.  If a 2800 scores 1/20 (1 draw every ten games), even that would only give the comp a ~3300 rating. 

 

But I'll be the first to admit that's just supposition based on games played at strange time controls, or odds, and limited incentive. I'm also basing that on the supposition that a 2800 can be persuaded to strive for a draw every game. I saw Naka reach equality several times in computer games, then push too hard and lose.

Elroch

Not meaningless. Rather, the hardware needs to be allowed for.

drmrboss
lfPatriotGames wrote:
baddejimme wrote:
lfPatriotGames wrote:

~snip~

I'm the first to admit I dont know anything about how ratings are calculated, but it seems to me there must be SOME relation between a computers rating and a persons rating. If a computer is rated at 3500 and a human is rated at 2800 are they really not 700 points apart? If the two ratings have nothing to do with each other, why have them so close and appear as if they are related?

The thing is, you cannot say "Stockfish is equivalent to a 3443 FIDE rating" without specifying what machine Stockfish is running on. I could run it on a high end server or a Raspberry Pi.

 

Yes, the numbers do look plausible for a PC, and I believe the ratings have been loosely calibrated based on the small number of published human vs computer games, but engines and humans aren't directly comparable and Stockfish's 3443 rating is meaningless when applied to vs human games.

I dont understand any part of that. But I'll take your word for it. At first I thought if a 3443 rating is meaningless it could mean an average beginner could beat it, since its not comparable. But I think you are saying the computer is a lot better, just not in a way that its rating and a humans rating are calculated. 

Multiple reasons

1. There were not enough games between humans and engines to calculate  elo.( curent sample games have big margin of error but engine elo rating lists use as comparable elo by using those sample games. Some rating list use shredder 12 in 1 cpu as 2800 elo).

A few games were played human vs man, such as Kramnik in around 2004 or 2006, some GM games vs Hiarcs. (human lose badly even on pc 10 years ago and even vs smart phones ).

 

2. Engines elo varies with hardwares and time control. 

 

3. Elo calculations will be quite inaccurate when elo between two players are  >400.  400 elo difference mean- 90 % scores from stronger player, 800 rating difference means- 99% scores from stronger player) 

For eg, IF stockfish  play vs human and win 10/10, 

Then it is impossible to calcualate elo for both, Actual rating difference would be human- 0 rating, stockfish -infinity rating(Fide elo calculator use 800 rating difference limit though , human 0 rating, stockfish 800 rating ).

Human may get a couple of draws vs stockfish if he play 100 games or more, but no one is willing to sponsor to ask human players to play vs engines. (Most GM wont be interested to play vs engines as well).

lfPatriotGames

OK now I kind of get it. Not meaningless, just different.

Elroch
drmrboss wrote:
lfPatriotGames wrote:
baddejimme wrote:
lfPatriotGames wrote:

~snip~

I'm the first to admit I dont know anything about how ratings are calculated, but it seems to me there must be SOME relation between a computers rating and a persons rating. If a computer is rated at 3500 and a human is rated at 2800 are they really not 700 points apart? If the two ratings have nothing to do with each other, why have them so close and appear as if they are related?

The thing is, you cannot say "Stockfish is equivalent to a 3443 FIDE rating" without specifying what machine Stockfish is running on. I could run it on a high end server or a Raspberry Pi.

 

Yes, the numbers do look plausible for a PC, and I believe the ratings have been loosely calibrated based on the small number of published human vs computer games, but engines and humans aren't directly comparable and Stockfish's 3443 rating is meaningless when applied to vs human games.

I dont understand any part of that. But I'll take your word for it. At first I thought if a 3443 rating is meaningless it could mean an average beginner could beat it, since its not comparable. But I think you are saying the computer is a lot better, just not in a way that its rating and a humans rating are calculated. 

Multiple reasons

1. There were not enough games between humans and engines to calculate  elo.( curent sample games have big margin of error but engine elo rating lists use as comparable elo by using those sample games. Some rating list use shredder 12 in 1 cpu as 2800 elo).

A few games were played human vs man, such as Kramnik in around 2004 or 2006, some GM games vs Hiarcs. (human lose badly even on pc 10 years ago and even vs smart phones ).

True, these are sources of uncertainty. But not huge uncertainty.

2. Engines elo varies with hardwares and time control. 

True, and this has to be carefully allowed for when comparing humans and computers. Regarding time controls, you simply have to pick one: you can't reliably convert between them, even if they are strongly correlated.

3. Elo calculations will be quite inaccurate when elo between two players are  >400.  400 elo difference mean- 90 % scores from stronger player, 800 rating difference means- 99% scores from stronger player) 

This doesn't matter at all. Elo attempts to predict scores between players with any difference in rating, but the ratings themselves are determined mainly by the games against players who are quite close in ratings (the reason is that these are the games with the maximum uncertainty in the result, so the Elo calculation obtains more information from them.

For eg, IF stockfish  play vs human and win 10/10, 

Then it is impossible to calcualate elo for both, Actual rating difference would be human- 0 rating, stockfish -infinity rating(Fide elo calculator use 800 rating difference limit though , human 0 rating, stockfish 800 rating ).

Human may get a couple of draws vs stockfish if he play 100 games or more, but no one is willing to sponsor to ask human players to play vs engines. (Most GM wont be interested to play vs engines as well).

They are not strong enough for this to be reliably informative (see earlier point).

What could be done is to have competitive games against weaker computers, and connect the resulting ratings to computer ratings by a continuum of play across the range of computers up to the strongest.

If you think about it, this is exactly how our ratings are consistent with Carlsen's . It's not that we play him and he gets 100% against us, it's that there is a continuum of play between amateur level and the top players. This system works rather well.

 

llama
Elroch wrote:
drmrboss wrote:
lfPatriotGames wrote:
baddejimme wrote:
lfPatriotGames wrote:

~snip~

I'm the first to admit I dont know anything about how ratings are calculated, but it seems to me there must be SOME relation between a computers rating and a persons rating. If a computer is rated at 3500 and a human is rated at 2800 are they really not 700 points apart? If the two ratings have nothing to do with each other, why have them so close and appear as if they are related?

The thing is, you cannot say "Stockfish is equivalent to a 3443 FIDE rating" without specifying what machine Stockfish is running on. I could run it on a high end server or a Raspberry Pi.

 

Yes, the numbers do look plausible for a PC, and I believe the ratings have been loosely calibrated based on the small number of published human vs computer games, but engines and humans aren't directly comparable and Stockfish's 3443 rating is meaningless when applied to vs human games.

I dont understand any part of that. But I'll take your word for it. At first I thought if a 3443 rating is meaningless it could mean an average beginner could beat it, since its not comparable. But I think you are saying the computer is a lot better, just not in a way that its rating and a humans rating are calculated. 

Multiple reasons

1. There were not enough games between humans and engines to calculate  elo.( curent sample games have big margin of error but engine elo rating lists use as comparable elo by using those sample games. Some rating list use shredder 12 in 1 cpu as 2800 elo).

A few games were played human vs man, such as Kramnik in around 2004 or 2006, some GM games vs Hiarcs. (human lose badly even on pc 10 years ago and even vs smart phones ).

True, these are sources of uncertainty. But not huge uncertainty.

2. Engines elo varies with hardwares and time control. 

True, and this has to be carefully allowed for when comparing humans and computers. Regarding time controls, you simply have to pick one: you can't reliably convert between them, even if they are strongly correlated.

3. Elo calculations will be quite inaccurate when elo between two players are  >400.  400 elo difference mean- 90 % scores from stronger player, 800 rating difference means- 99% scores from stronger player) 

This doesn't matter at all.

That's simply not true. Elo would be the first to admit his formula loses predictive power when the differences are very large, and actual usage has proven it.

https://en.chessbase.com/post/the-elo-rating-system-correcting-the-expectancy-tables

Elo attempts to predict scores between players with any difference in rating, but the ratings themselves are determined mainly by the games against players who are quite close in ratings (the reason is that these are the games with the maximum uncertainty in the result, so the Elo calculation obtains more information from them.

For eg, IF stockfish  play vs human and win 10/10, 

Then it is impossible to calcualate elo for both, Actual rating difference would be human- 0 rating, stockfish -infinity rating(Fide elo calculator use 800 rating difference limit though , human 0 rating, stockfish 800 rating ).

Human may get a couple of draws vs stockfish if he play 100 games or more, but no one is willing to sponsor to ask human players to play vs engines. (Most GM wont be interested to play vs engines as well).

They are not strong enough for this to be reliably informative (see earlier point).

What could be done is to have competitive games against weaker computers, and connect the resulting ratings to computer ratings by a continuum of play across the range of computers up to the strongest.

If you think about it, this is exactly how our ratings are consistent with Carlsen's . It's not that we play him and he gets 100% against us, it's that there is a continuum of play between amateur level and the top players. This system works rather well.

 

 

EndgameEnthusiast2357

This is getting interesting. It seems I've missed alot happy.png

Elroch
Telestu wrote:
Elroch wrote:
drmrboss wrote:

 

3. Elo calculations will be quite inaccurate when elo between two players are  >400.  400 elo difference mean- 90 % scores from stronger player, 800 rating difference means- 99% scores from stronger player) 

This doesn't matter at all.

That's simply not true. Elo would be the first to admit his formula loses predictive power when the differences are very large, and actual usage has proven it.

https://en.chessbase.com/post/the-elo-rating-system-correcting-the-expectancy-tables

Elo attempts to predict scores between players with any difference in rating, but the ratings themselves are determined mainly by the games against players who are quite close in ratings (the reason is that these are the games with the maximum uncertainty in the result, so the Elo calculation obtains more information from them.

For eg, IF stockfish  play vs human and win 10/10, 

Then it is impossible to calcualate elo for both, Actual rating difference would be human- 0 rating, stockfish -infinity rating(Fide elo calculator use 800 rating difference limit though , human 0 rating, stockfish 800 rating ).

Human may get a couple of draws vs stockfish if he play 100 games or more, but no one is willing to sponsor to ask human players to play vs engines. (Most GM wont be interested to play vs engines as well).

They are not strong enough for this to be reliably informative (see earlier point).

What could be done is to have competitive games against weaker computers, and connect the resulting ratings to computer ratings by a continuum of play across the range of computers up to the strongest.

If you think about it, this is exactly how our ratings are consistent with Carlsen's . It's not that we play him and he gets 100% against us, it's that there is a continuum of play between amateur level and the top players. This system works rather well.

 

 The point is that the Elo system is not defined with the primary objective of predicting the precise score between two players who have very different ratings. It is about predicting results between any two players whose ratings are reasonably close (say up to several hundred points) in a wide range of abilities (say 0 to 3600).

So a computer being 1200 points stronger than you is more about the results of 13 hypothetical players whose ratings are 100 points apart, with you at one end and the computer at the other. Predictions are quite close to accurate for players 200 apart 300 apart but get increasingly uncertain from there Unusual results distort the estimate for much longer, so much larger samples are needed. In addition, any inaccuracies in the usual assumption that odds can be multiplied (any inaccuracies in the usual assumption that odds can be multiplied grow - as you mentioned in your link).

EndgameEnthusiast2357

The rating system needs improvement. There is no way to accurately measure ability. I could lose cause I miscalculated a hard tactic, not cause I suck compared to other player. I wish people wouldn't say stupid shit like: "oh your too low rated, you cannot play me." Like, it's a game, you refuse to spend 2 minutes even letting me try. The Narcissistic grandmasters tend to do that. They'll also say: "oh if you pay me 50 bux, you can play me". Like, I have to spend money to do something you like doing anyway? Pffftt. Ridiculous.

pawn8888

3443 rating isn't that bad since it only goes up to 3500. It would probably take a miracle for a human to even get a draw against Stockfish. I've looked a few times at his 3 minute games and he never loses. Computers don't make mistakes and know all the tactics, plus they calculate everything a lot faster.  Alpha Zero seems to beat Stockfish so I guess it's about 3450 or so.  

EndgameEnthusiast2357
pawn8888 wrote:

3443 rating isn't that bad since it only goes up to 3500. It would probably take a miracle for a human to even get a draw against Stockfish. I've looked a few times at his 3 minute games and he never loses. Computers don't make mistakes and know all the tactics, plus they calculate everything a lot faster.  Alpha Zero seems to beat Stockfish so I guess it's about 3450 or so.  

They are just starting to improve the AI. Computers and people should be rated differently. People are good at the logic and patterns of chess, while computers are good at speed, calculating all the different positions. A person will understand why a line is winning, but a computer will play it only if it comes across it in its millions of numbercrunched positions. Computers are starting to improve on pattern recognition, long tactics, and Optimization however.

jsaepuru
Elroch wrote:
MickinMD wrote:

If you look at 3400 vs 3000 rated chess engines, the 3400 wins the vast majority of games.

I would guess that 3000 would have no chance against a 4000 engine and 3000 would require a blunder by a 4000 human to win.

The definition of a k point Elo difference is that the stronger player scores (1 - 10^(k/400) )/ (1 + 10^(k/100). (This is a logistic function, or sigmoid, for those familiar with such things).

For k= 400, this is 91% and for k=1000 it is 99.7%, regardless of how strong the players are.

For stronger players, the only difference is that there are more draws. This is why when two strong players with a 400 point difference meet, there are almost 80% wins for the stronger and almost 20% draws, the weaker playing winning once in a blue moon. When two weaker players with such a difference meet, the stronger wins more often than for the two stronger players, but the weaker wins more often to a similar extent, displacing some of the draws.

So, how long would an always victorious champion take to reach Elo of 4000?

drmrboss

null

 

Whether there will be 4000 elo or not depend on likelyhood of mistake by your oppoent.

In above diagram,  that is 16 men position, current 3400 stockfish on decent time control will almost never lose vs perfect engine. 

 

So as long as 3400 stockfish never make mistake, his opponent will never be beyond 3400 elo.

 

So what is the chance of stockfish  making mistake vs 100% solved engine(perfect engine).

The chance of making mistakes is more , when there are more choices that usually happen between move 10 to move 40.   First 10 moves in opening are heavily analysed and very low chance of  making mistake, and the late ending as shown in figure will unlikely end as mistake for stockfish).

 

If 3400 stockfish has 

98% probaibility of making correct decision in every move  for 30 uncertain  moves = 98^30= 54% overall accurancy that  will give  +200 elo to perfect engine. 

 

95% probability of correction in every move= 21% overall accurancy=  +400 elo to perfect engine 

 

90% probability of correction in every move=  4% overall accurancy =  +676 elo to perfect engine.

 

So, 4000  elo will happen only when current 3400 elo stockfish has only 90% accurancy in each and every move for 30 moves.