The rating of a perfect player

Sort:
osintx
Elroch wrote:

I believe computers have not yet reached a level where draws are steadily increasing. They just make deeper errors than they used to. Top human chess players, by contrast, choose lines that are safer or riskier based on factors that are irrelevant to computers. Some top professionals probably dislike losing more than they like winning: this is a utility function which is different to the points scored.

I disagree. Even for computers there is a definite correlation between higher ratings and higher percentage of draws. Just look at the computer chess rating list and you will notice this trend:

http://www.computerchess.org.uk/ccrl/4040/rating_list_pure.html

If you download the games PGN file and do some processing to throw out games where the rating difference between the two players was more than 100 points and make a histogram of the average player ratings vs percentage of draws you will get data like this:

1850,19.53 

1900,24.23 

1950,22.64 

2000,24.93 

2050,24.55 

2100,31.33 

2150,28.55 

2200,29.13 

2250,28.58 

2300,26.90 

2350,26.90 

2400,30.99 

2450,31.57 

2500,33.15 

2550,35.27 

2600,35.92 

2650,38.15 

2700,37.59 

2750,35.20 

2800,40.58 

2850,43.13 

2900,46.32 

2950,43.47 

3000,48.11 

3050,50.41 

3100,58.02 

3150,65.69 

3200,59.09 

3250,58.46 

3300,61.66

Make a scatter plot of that data and do a linear regression and at a rating of about 4820 the draw percentage hits 100%. 

http://www.alcula.com/calculators/statistics/linear-regression/

However, I think that as computers get closer to perfect play the percentage of draws increases more rapidly. So my educated guess is that the rating of the perfect player is closer to 4000.

Fritzbayer1

I would guess about 3000-3100 let me know if I'm close.

Elroch

The guess that as computers get closer to perfect play the draw percentage increases more rapidly is unfounded.

A more natural expectation would be that the proportion of draws would decrease exponentially, so that a certain increase in ratings would approximately half the fraction of decisive games. This is loosely based on the notion that heuristic inaccuracy of moves (in centipawns) does not decrease linearly, but more like exponentially.

Elroch
Fritzbayer1 wrote:

I would guess about 3000-3100 let me know if I'm close.

Not sure what this meant, but don't you know there are computers with ratings around 3400 limited to slow 4-core machines (so considerably higher with fast hardware)?

Elroch

The recent resounding victory of AlphaZero over Stockfish contains lots of empirical hints about the topic of this forum.

Firstly, AlphaZero's strength plateaued entirely after a certain stage of self-learning. [EDIT: on reflection, we cannot read too much into this. Sometimes neural networks run into a plateau and later leave it to find considerable further improvement].

Secondly, there was a steady large reduction of the increase in strength with computing time (or processing speed) for both Stockfish and AlphaZero right across the range from 0.1 seconds to 1 minute (the maximum used), and the graph suggests both would have asymptotes not far above their ratings at 1 min per move (i,e. 100 times more time would not increase ratings much, a further 100 times would do far less).

However, the suggested asymptotes are different, which weakens the case that either architecture is asymptotic to perfect play with increased computing time (it may be that space limits are the reason for this).

See the paper.

mcris

This graph from the paper reveals all. If Stockfisk had 2 GB of RAM or some more computing power, AZ would be lost.

Elroch

That is an unfounded claim. Where is your evidence that Stockfish would be a spectacular 100 points stronger by merely increasing the hash table? Note that hash table changes are empirically associated with tens of Elo point changes and it is not always better to have more (as there are competing efficiencies).

ami_anjali
Artsew wrote:

Whilst I understand the intent off your question. The elo off a 'perfect' chessplayer is determined largely by the strength off the #2 player.

As elo calculations go I believe you hit your roof at around a rating difference off 800. Meaning that defeating someone rated +/- 800 below you will not be able to boost your rating anymore.  So if we assume that the second highest rated player has a rating off lets say 2800 and the perfect chessplayer defeats the #2 every time, then his/her rating will become no more then 3600

This is very helpful. Sums up everything as far as the theoretical side is concerned.

mcris
Elroch wrote:

That is an unfounded claim. Where is your evidence that Stockfish would be a spectacular 100 points stronger by merely increasing the hash table? Note that hash table changes are empirically associated with tens of Elo point changes and it is not always better to have more (as there are competing efficiencies).

This is not my claim. SF developers spoken about too small RAM used for SF8 in this match.

Slow_pawn

I liked the idea of Alphazero learning chess on it's own and playing that well after a short time. I only read an article and reviewed one game, so I don't really know much about it, but if it is legit, Google just taught Stockfish how to be better. The next match will be different. All of this has made me wonder though, if a human had the same deep calculation ability as an engine, who would be better then? When it comes to the ram, I don't know, I think stockfish programmers should've considered that going into the match. It's not like they were blind, having no idea of the other team's technical abilities.  

Elroch
mcris wrote:
Elroch wrote:

That is an unfounded claim. Where is your evidence that Stockfish would be a spectacular 100 points stronger by merely increasing the hash table? Note that hash table changes are empirically associated with tens of Elo point changes and it is not always better to have more (as there are competing efficiencies).

This is not my claim. SF developers spoken about too small RAM used for SF8 in this match.

That does not in itself justify the claim that a huge 100 points improvement could be achieved by increasing the hash table. [I believe the hash table is most of the memory usage, but the idea of top end hardware having 1 GB RAM these days is ridiculous, so I continue to believe it was the hash table size that was chosen to be 1 GB].

The SF developers (like the DeepMind paper) refer to hash table size, not RAM and do not provide any indication of the expected impact.

Here is the Stockfish developer's message:

"The match results by themselves are not particularly meaningful because of the rather strange choice of time controls and Stockfish parameter settings: The games were played at a fixed time of 1 minute/move, which means that Stockfish has no use of its time management heuristics (lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move; at a fixed time per move, the strength will suffer significantly). The version of Stockfish used is one year old, was playing with far more search threads than has ever received any significant amount of testing, and had way too small hash tables for the number of threads. I believe the percentage of draws would have been much higher in a match with more normal conditions.

On the other hand, there is no doubt that AlphaZero could have played better if more work had been put into the project (although the "4 hours of learning" mentioned in the paper is highly misleading when you take into account the massive hardware resources used during those 4 hours). But in any case, Stockfish vs AlphaZero is very much a comparison of apples to orangutans. One is a conventional chess program running on ordinary computers, the other uses fundamentally different techniques and is running on custom designed hardware that is not available for purchase (and would be way out of the budget of ordinary users if it were).

 

From another perspective, the apples vs orangutans angle is the most exciting thing about this: We now have two extremely different (both on the hardware and the software side) man-made entities that both display super-human chess playing abilities. That's much more interesting than yet another chess program that does the same thing as existing chess programs, just a little better. Furthermore, the adaptability of the AlphaZero approach to new domains opens exciting possibilities for the future.

For chess players using computer chess programs as a tool, this breakthrough is unlikely to have a great impact, at least in the short term, because of the lack of suitable hardware for affordable prices.

For chess engine programmers -- and for programmers in many other interesting domains -- the emergence of machine learning techniques that require massive hardware resources in order to be effective is a little disheartening. In a few years, it is quite possible that an AlphaZero like chess program can be made to run on ordinary computers, but the hardware resources required to _create_ them will still be way beyond the budget of hobbyists or average sized companies. It is possible that an open source project with a large distributed network of computers run by volunteers could work, but the days of hundreds of unique chess engines, each with their own individual quirks and personalities, will be gone." 

Elroch
ami_anjali wrote:
Artsew wrote:

Whilst I understand the intent off your question. The elo off a 'perfect' chessplayer is determined largely by the strength off the #2 player.

As elo calculations go I believe you hit your roof at around a rating difference off 800. Meaning that defeating someone rated +/- 800 below you will not be able to boost your rating anymore.  So if we assume that the second highest rated player has a rating off lets say 2800 and the perfect chessplayer defeats the #2 every time, then his/her rating will become no more then 3600

This is very helpful. Sums up everything as far as the theoretical side is concerned.

Except unfortunately it is mostly wrong. With a big rating difference, your rating just rises very slowly as you win games. With the statistical assumptions of the Elo system, you can still reach your correct rating by playing games against any player with any rating (the assumptions are obviously shaky if the difference in rating is very large.

For a rating difference of 800 points, it happens that the Elo formula predicts a score of 99%. Conversely, this means that winning games against a player with a rating 800 points below yours increases your rating 100 times more slowly than if you win games against a player with the same rating.

mcris
Elroch wrote:
mcris wrote:
Elroch wrote:

That is an unfounded claim. Where is your evidence that Stockfish would be a spectacular 100 points stronger by merely increasing the hash table? Note that hash table changes are empirically associated with tens of Elo point changes and it is not always better to have more (as there are competing efficiencies).

This is not my claim. SF developers spoken about too small RAM used for SF8 in this match.

That does not in itself justify the claim that a huge 100 points improvement could be achieved by increasing the hash table. [I believe the hash table is most of the memory usage, but the idea of top end hardware having 1 GB RAM these days is ridiculous, so I continue to believe it was the hash table size that was chosen to be 1 GB].

There is no use to have 3-4- or more GB of RAM is the hash size is set to only 1GB.

HobbyPIayer

AlphaZero is playing at around 3600. But it's also the first self-learning AI player (meaning, in the grand scheme of things, its thought-processes will be considered "primitive" by future standards).

So it's likely that future AI players will be even smarter, and more creative, than AlphaZero.

My guess is "perfect" play would be around 3800 or so. Maybe 4000.

Elroch
mcris wrote:
Elroch wrote:
mcris wrote:
Elroch wrote:

That is an unfounded claim. Where is your evidence that Stockfish would be a spectacular 100 points stronger by merely increasing the hash table? Note that hash table changes are empirically associated with tens of Elo point changes and it is not always better to have more (as there are competing efficiencies).

This is not my claim. SF developers spoken about too small RAM used for SF8 in this match.

That does not in itself justify the claim that a huge 100 points improvement could be achieved by increasing the hash table. [I believe the hash table is most of the memory usage, but the idea of top end hardware having 1 GB RAM these days is ridiculous, so I continue to believe it was the hash table size that was chosen to be 1 GB].

There is no use to have 3-4- or more GB of RAM is the hash size is set to only 1GB.

You would need enough for any other demands on RAM by the program - probably quite limited but important - and the OS.

Cavatine

Anyway, I wanted to add that Google now has some very awesome hardware for sale, to run neural net computations that are very energy-efficient.  I know nothing about them except there is this news article from somewhere in my browser history ...

https://www.extremetech.com/computing/263951-mit-announces-new-neural-network-processor-cuts-power-consumption-95

I can't find the article I was thinking of, which showed a chip on a circuit board, with a shiny metal heat sink on it.  I guess it was on Reddit when I saw it.

Elroch

To be clear, that's MIT research, not google, and it seems a rather extreme example. I am not sure how many AI practitioners would be happy about the reported 2-3% loss in accuracy. For some purposes it would be great, for others not so much!

Google's TPUs are reportedly very efficient, and are available as a supercomputer cloud service, but they have announced no plans to sell TPUs.

Elroch
osintx wrote:
Elroch wrote:

I believe computers have not yet reached a level where draws are steadily increasing. They just make deeper errors than they used to. Top human chess players, by contrast, choose lines that are safer or riskier based on factors that are irrelevant to computers. Some top professionals probably dislike losing more than they like winning: this is a utility function which is different to the points scored.

I disagree. Even for computers there is a definite correlation between higher ratings and higher percentage of draws. Just look at the computer chess rating list and you will notice this trend:

http://www.computerchess.org.uk/ccrl/4040/rating_list_pure.html

If you download the games PGN file and do some processing to throw out games where the rating difference between the two players was more than 100 points and make a histogram of the average player ratings vs percentage of draws you will get data like this:

1850,19.53 

1900,24.23 

1950,22.64 

2000,24.93 

2050,24.55 

2100,31.33 

2150,28.55 

2200,29.13 

2250,28.58 

2300,26.90 

2350,26.90 

2400,30.99 

2450,31.57 

2500,33.15 

2550,35.27 

2600,35.92 

2650,38.15 

2700,37.59 

2750,35.20 

2800,40.58 

2850,43.13 

2900,46.32 

2950,43.47 

3000,48.11 

3050,50.41 

3100,58.02 

3150,65.69 

3200,59.09 

3250,58.46 

3300,61.66

Make a scatter plot of that data and do a linear regression and at a rating of about 4820 the draw percentage hits 100%. 

http://www.alcula.com/calculators/statistics/linear-regression/

However, I think that as computers get closer to perfect play the percentage of draws increases more rapidly. So my educated guess is that the rating of the perfect player is closer to 4000.

Rather a slow response by me here ...

What matters more is the way the advantage of white changes over time. Empirically, it appears that this is increasing over time. It has now reached a level where the wins between roughly equally matched players are overwhelmingly with the white pieces (this phenomenon was already visible at the highest level of human chess, but it is not clear if factors beyond the theoretical ones were at play there).

HarleyK314

I found the exponential curve of best fit based on data from the CCRL website, and got an estimate of 4257 ELO as the point in which an engine will draw every game. So yeah, give it about 20 years (based on the current growth rate of slightly over 30 ELO per year), and I reckon chess engines will reach a point where draws are basically inevitable happy.png

Elroch

You may be extrapolating something else. When most of the wins are with white, the draw percentage says no more than how close the ratings of the two participants are. So if the proportion of draws is rising, this means the ratings of the participants are getting closer, not that the participants are getting near to perfect.

As an example, two closely matched players who were lousy at getting each other to blunder wink.png(say Carlsen and  Caruana) might get 100% draws (they did in 2018), but might have ratings in the 2800s, thus very far from perfect.