Progress of 8 piece tablebase?

Sort:
Oldest
tygxc

@41

"If the latest versions of Stockfish still perform poorly in tablebase endings, I want to see it!"
++ No, it does very well in e.g. KRPP vs. KRP: top 1 engine move is table base exact.
However KNN vs. KP is a known anomaly.

bigD521

Would it be feasible to create a variety of tablebases with both a specific on pieces, and goal?

2 Kings and up to 14 pawns.  Goal to promote. or promote to x amount of pieces, no mate. Mating is allowed if done with the pawns, or on promotion. ( Useful to study pawn breaks, or maneuvering pawns and king.)

2 Kings, and perhaps up to 3 pieces ( Q, R, B, N ), and 13 Pawns. Goal to Mate if possible without promoting, or with promoting up to x amount of pieces. Goal 2 just promoting without mating. (This seems unlikely to me) Purpose is to increase pieces on board. Mating should be allowed if possible  with King, pieces and pawns, or ends with x amount or promotions.

tygxc

@43
"Purpose is to increase pieces on board."
++ How endgame table bases are generated:

  1. start from 2 kings.
  2. Then add 1 man for 3 men,
  3. from that add 1 man for 4 men,
  4. from that add 1 man for 5 men,
  5. from that add 1 man for 6 men,
  6. from that add 1 man for 7 men,
  7. from that add 1 man for 8 men, now work in progress.
MARattigan
cobra91 wrote:
MARattigan wrote:
cobra91 wrote:

...

Regarding your conjecture: based on how far the available chess software has progressed at this point, I'd tend to agree. ...

I've posted half a dozen examples on the same thread where the tablebases show it's false for recent versions of Stockfish, so no, it's not a valid conjecture.

I'll take a look at this when I have a bit more time. If the latest versions of Stockfish still perform poorly in tablebase endings, I want to see it!

This is SF15 attempting mate in 46. (Intel(R) Pentium(R) CPU  J3710  @ 1.60GHz   1.60 GHz 2GB hash, 1 hour on the clocks.)

His opponent would not have needed to study the endgame to draw. SF15 draws by force for him.

But SF15 doesn't perform poorly in endgames. If I'd made it mate in 35 it would manage it every time (and SF14 - alone among the SFs - would manage a 36 move mate, I believe, but no mate in 37 with the pawn in the same position). Most humans haven't looked at the endgame and wouldn't manage the mate or a convincing defence against a tablebase.

I don't think engines perform poorly in endgames. In some endgames I've had a very good look at they seem to be a bit crap, but they're sh*t hot at all the ones I haven't had a very good look at.

I believe the idea that engines are bad at endgames is an illusion due to humans being capable of understanding positions with few men using techniques not available to engines (real intelligence). I think both humans and engines get worse very quickly as you add more men to the board. But humans obviously deteriorate quicker than the engines. 

And to answer @tygxc's point it's not an anomaly. SF does badly against a tablebase in difficult positions, which are also the ones humans find difficult. That can be for example in KQPvKQ or KQvKNN and a host of 6 and 7 man endgames (including KRPP vs. KRP) where we can check from tablebases (and which I can't play either). It would probably reliably beat me in difficult mates from those, but it has no chance against a tablebase or even against itself.

bigD521

@44

Yes 8 men are in consideration/progress

Either you did not comprehend my question, or you gave me an obscure reply.

MARattigan
bigD521 wrote:

@44

Yes 8 men are in consideration/progress

Either you did not comprehend my question, or you gave me an obscure reply.

But I think so far only DTC without the 50 move rule taken into account. I may be wrong.

Your second sentence is no doubt correct, but you don't sketch any method for producing the tablebases in question. You'ld need to have some estimate of how long it would take and if there were a lot of pawns and goals that could be forever probably.

But I think there is already software that will start from goals and produce mini tablebases (or same curtailed at a specific position). It would obviously be possible given well defined goals. I should try an internet search if you don't get any answers.

What did you mean by "(This seems unlikely to me)" by the way?

CraigIreland

An 8 man tablebase would help engines but I wonder what benefit it would be to humans. A layer of AI built on top of it could be used to explain what can be learned from it. Perhaps GMs with access to that learning could benefit from it.

MARattigan

Nah. Just the thing for humans.

All you have to do is remember about 500 trillion positions for each of about 10000 endgame classifications and what the index is for each, then it's a piece of cake.

If you forget any you'll probably be able to look them up online.

bigD521
MARattigan wrote:
bigD521 wrote:

@44

Yes 8 men are in consideration/progress

Either you did not comprehend my question, or you gave me an obscure reply.

But I think so far only DTC without the 50 move rule taken into account. I may be wrong.

Your second sentence is no doubt correct, but you don't sketch any method for producing the tablebases in question. You'ld need to have some estimate of how long it would take and if there were a lot of pawns and goals that could be forever probably.

But I think there is already software that will start from goals and produce mini tablebases (or same curtailed at a specific position). It would obviously be possible given well defined goals. I should try an internet search if you don't get any answers.

What did you mean by "(This seems unlikely to me)" by the way?

I did some searching and did not come up with anything.

My second sentence was Pawns only.  I am a poor basic computer user and know nothing about programing. Therefore I cannot sketch anything.

My third sentence with minor major pieces I did not think would work. Perhaps to many pieces? Perhaps not being able to both mate and promote? The code could only be written to either mate or promote, not both? Limiting the amount of promotions can not be done?

What got me thinking awhile back, was that perhaps it really isn't about pieces, but about how many positions are achievable. 7 man can equal 5 queens which could give approx. 50 first moves along with King moves added. Therefore if one reduces the amount of moves, then more pieces could be added. 7 pawns each side, even all on the 2nd and 7th ranks it still amounts to a max of 14 moves long with perhaps 8 more with the king. The same applies to the third line. 

Perhaps instead of a setup as we have now, a list of options for the user to select from. Then it would be all presented as a package.

7 piece

2K and 14 Pawns, up to 7 pawns each color.

2K 1Q and 2 ( R,N. B) and x pawns

tygxc

@49

"All you have to do is remember about 500 trillion positions"
++ No, all you have to do is to remember the conclusion draw/won/lost and the method to achieve that.
Example: KNN vs. KP is a win if the pawn is no further than the Troitsky line.
The method is to block the pawn with a knight, drive the king to a corner with the king and the other knight then release the blockade of the pawn and approach the blocking knight to checkmate.
There is no need to memorize any positions, only the outcome and the method.

cobra91
MARattigan wrote:
cobra91 wrote:
MARattigan wrote:
cobra91 wrote:

...

Regarding your conjecture: based on how far the available chess software has progressed at this point, I'd tend to agree. ...

I've posted half a dozen examples on the same thread where the tablebases show it's false for recent versions of Stockfish, so no, it's not a valid conjecture.

I'll take a look at this when I have a bit more time. If the latest versions of Stockfish still perform poorly in tablebase endings, I want to see it!

This is SF15 attempting mate in 36. (Intel(R) Pentium(R) CPU  J3710  @ 1.60GHz   1.60 GHz 2GB hash, 1 hour on the clocks.)

[...]

Okay, I assume no 4-man tables were used here, as the blunder 6. Nf3?? would surely never be played by a machine that can look up the KNN vs. K ending and instantly see it's drawn. Can I also assume that, even when armed with that resource, SF15 still fails to make consistent progress in KNN vs. KP?

Second question: What is the approximate T3 accuracy of SF15 in this ending? That is, how consistently is the optimal tablebase move (or an equally good move) among the engine's top 3 choices?

Can you show an example for KQP vs. KQ or KRPP vs. KRP? Unlike KNN vs. KP, these endings are reasonably common in serious play, and I want to see exactly what is meant by "badly" and "difficult" (I do have a rough idea, but am still very curious). If such positions really can't be handled adequately without full tablebase support (by directly looking up the best move), chess could be significantly further from solved than I'd previously thought.

cobra91
tygxc wrote:

The argument is this:
From figure 2a of this scientific paper https://arxiv.org/pdf/2009.04374.pdf 
At 1 s/move: 88.2% draw, 11.8% decisive games, i.e. games with an odd number of errors
At 1 min/move: 97.7% draw, 2.3% decisive games, i.e. games with an odd number of errors
Extrapolating
At 1 h/move: 2.3% * 2.3% / 11.8% = 0.44% decisive games, i.e. games with an odd number of errors, i.e. 1 error in 227 games.
At 60 h/move: 0.44% * 2.3% / 11.8% = 0.087% decisive games, i.e. 1 error in 1144 games.
Thus top 2 moves: 1 error in 1144² = 1.3 * 10^6 games
Thus top 3 moves: 1 error in 1144³ = 1.5 * 10^9 games
Thus top 4 moves: 1 error in 1144^4 = 1.7 * 10^12 games
Assuming 100 positions/game: 1 error in 1.7 * 10^14 positions

Had to read a 98-page PDF before I could respond - thanks for that! grin

Based on the graphs from page 7, it looks like you are trying to extrapolate an inverse trend curve... using just two data points! Not exactly rigorous science, unless there is additional corroborating data for decisive games at, e.g., 2 min/move, 30 s/move, 15 s/move, etc.

To be fair, though, this topic is rather unique, in that we aren't really too concerned with precise figures. It is only necessary to estimate the error rate to within a few orders of magnitude. I think a statistical argument like yours could be very promising, but requires more data which is somewhat lacking in the case of AlphaZero. As usual, the best bet is probably to just use the strongest/most recent Stockfish iteration, obtaining data from long series of self-play games at systematically varying time controls.

tygxc

@53

"extrapolate an inverse trend curve... using just two data points"
++ Extrapolating from data we have.

"we aren't really too concerned with precise figures" ++ Right.

"It is only necessary to estimate the error rate" ++ Right.

"use the strongest/most recent Stockfish iteration, obtaining data from long series of self-play games at systematically varying time controls."
++ That would be more precise, but precision is not needed.
The difference between top 1 move: 1 error in 10^5 positions,
top 2 moves: 1 error in 10^10 positions,
top 3 moves: 1 error in 10^15 positions,
and top 4 moves: 1 error in 10^20 positions is huge anyway.

MARattigan
cobra91 wrote:
MARattigan wrote:
cobra91 wrote:
MARattigan wrote:
cobra91 wrote:

...

Regarding your conjecture: based on how far the available chess software has progressed at this point, I'd tend to agree. ...

I've posted half a dozen examples on the same thread where the tablebases show it's false for recent versions of Stockfish, so no, it's not a valid conjecture.

I'll take a look at this when I have a bit more time. If the latest versions of Stockfish still perform poorly in tablebase endings, I want to see it!

This is SF15 attempting mate in 36. (Intel(R) Pentium(R) CPU  J3710  @ 1.60GHz   1.60 GHz 2GB hash, 1 hour on the clocks.)

[...]

Okay, I assume no 4-man tables were used here, as the blunder 6. Nf3?? would surely never be played by a machine that can look up the KNN vs. K ending and instantly see it's drawn. Can I also assume that, even when armed with that resource, SF15 still fails to make consistent progress in KNN vs. KP?

Your first assumption is correct. Your second assumption is not correct. (Edit: Apologies, armed only with 4 man tablebases your second assumption almost certainly is correct, though I haven't tried it.) With Syzygy any program will play perfectly. The point is you can't give it tablebases beyond 7 men at the moment.

Second question: What is the approximate T3 accuracy of SF15 in this ending? That is, how consistently is the optimal tablebase move (or an equally good move) among the engine's top 3 choices?

I can tell you that all SF's moves were objectively accurate except 6.Nf3. Have to confess I don't know what T3 accuracy is, but here's what Coach makes of it. (You'll probably need to zoom to about 250% in your browser.)


I can also tell you that all my moves were objectively accurate, so I feel a bit miffed about scoring only 94.2.

Edit: I should have read your post more carefully. You explain T3. See later post.

Can you show an example for KQP vs. KQ or KRPP vs. KRP? Unlike KNN vs. KP, these endings are reasonably common in serious play, and I want to see exactly what is meant by "badly" and "difficult" (I do have a rough idea, but am still very curious). If such positions really can't be handled adequately without full tablebase support (by directly looking up the best move), chess could be significantly further from solved than I'd previously thought.

There's a couple of examples of KRPPvKRP here, the first a draw (with the 50 move rule in force) and the second a win. They're SF15 v SF15 rather than SF15 v Syzygy because I don't have any Syzygy tablebases (or the room to incorporate 7 men).

SF15 manages to lose 3 of the 12 games from the drawn position and fails to win 6 of the 12 games from the winning position. It would probably do significantly worse against Syzygy.

I can dig out or reconstruct a KQPvKQ example if you want, I haven't done a series of these over the time ranges shown in the examples referred to above.

But it can even screw up KRvK. In the position below White has mate in 16 at the point (shown) where I defend it against SF15, but the position is complicated by the fact that many positions (in the sense of the 3-fold repetition rule) have previously been repeated.

It fails to mate!

In fact, if you have practiced mating in the shortest number of moves in KRvK you will also notice that it routinely makes slips in accuracy from positions with ply count 0 (ergo no repetitions) though it won't blow any points from those positions.

As for what I mean by "badly" and "difficult", I was using the terms in a qualitative sense. A discussion of how you might quantify these would no doubt be interesting (and involved), but I don't attempt to start one in this post.

 

MARattigan
tygxc wrote:

@49

"All you have to do is remember about 500 trillion positions"
++ No, all you have to do is to remember the conclusion draw/won/lost and the method to achieve that.
Example: KNN vs. KP is a win if the pawn is no further than the Troitsky line.

Really? Choose a side and post your move.

Either side to play, ply count 0

 

This is a win (and within the 50 move rule) if that's what you mean, but they're not all - most are not.

White to move, ply count 0, Black wins in 62

 

You rather beg the question of what if the pawn is not no further than the Troitsky line. (Perhaps those positions don't appear in your 10^17 sensible positions.)

You also give no details of how you discovered your statement from a tablebase, which was what @CraigIreland was asking about.

The method is to block the pawn with a knight, drive the king to a corner with the king and the other knight then release the blockade of the pawn and approach the blocking knight to checkmate.
There is no need to memorize any positions, only the outcome and the method.

OK, then take up my challenge and do it! (I've conveniently blocked the pawn with a knight for you already.) I'll post my responses.

 

cobra91
tygxc wrote:

"extrapolate an inverse trend curve... using just two data points"
++ Extrapolating from data we have.

Yes, what I meant to ask was, "Is there more self-play data for AlphaZero, at different time controls, than what I referred to in post #53?" If not, an alternate data source may be needed (see below).

It's true that not as much precision as usual is needed, but with none at all there is no statistical basis for a meaningful conclusion. In other words, errors of more than a few orders of magnitude are somewhat problematic. For instance, your final figure above (1 error in 10^20 positions) differs from what you arrived at in post #39 (1 error in 10^14 positions) by 6 orders of magnitude.

The point of gathering self-play data for a wider variety of time controls is not to aim for high precision in the extrapolated calculations, but rather to establish with high confidence that assumed trends in error rate are, in fact, genuinely reliable. Is there a truly consistent inverse relationship between time per move and the expected decisive game percentage, and consequently the expected error rate? And what about the assumed exponential decay of the error rate with each additional top engine move considered? It sounds reasonable, but is there any data at all to support it? Once again, the precision of the calculations matters far less than the soundness of their statistical bases.

tygxc

@57

"Is there a truly consistent inverse relationship between time per move and the expected decisive game percentage" ++ It is only logical there is.
In 60 h a computer can calculate deeper than in 1 h, than in 1 minute, than in 1 second.
If a computer could calculate say 200 moves deep, then it would never err as it would exhaust all legal positions.

"And what about the assumed exponential decay of the error rate with each additional top engine move considered?" ++ That comes from probability.
(Probability of top 1 move wrong and top 2 move wrong)
= (probability of top 1 move wrong) * (probability of top 2 move wrong)
= (probability of top 1 move wrong)²

MARattigan
tygxc wrote:

@57

...
If a computer could calculate say 200 moves deep, then it would never err as it would exhaust all legal positions.

...

Er, you sure about that?

White to play and mate in 549

 

MARattigan
tygxc wrote:

@57

"Is there a truly consistent inverse relationship between time per move and the expected decisive game percentage" ++ It is only logical there is.
In 60 h a computer can calculate deeper than in 1 h, than in 1 minute, than in 1 second....

You obviously missed this first time I posted it for you.

 
 

Blunder rates against think time in basic rules game (no 50-move rule)

Blunder rates against think time in competition rules (50 move rule in force).

MARattigan
tygxc wrote:

...

"And what about the assumed exponential decay of the error rate with each additional top engine move considered?" ++ That comes from probability.
(Probability of top 1 move wrong and top 2 move wrong)
= (probability of top 1 move wrong) * (probability of top 2 move wrong)
= (probability of top 1 move wrong)²

Probability that @tygxc thinks he's a teapot = 1.

When are you going to stop pretending you haven't been invited to apply your "calculations" to this series of games where the results can be checked against the tablebases?

Forums
Forum Legend
Following
New Comments
Locked Topic
Pinned Topic