Chess engine analysis... What is it good for?

Sort:
pcalugaru

Being a must  for correspondence  chess, & checking for blunders... But.....  that's about it.   

I'm at the cross roads with it.  I'm getting convinced day by day,  that modern chess engine evaluation isn't as important as I thought it was.

Hear me out.

(for sake of the topic .. I'm going to refer to an opening I use as Black)  

I have been playing the Center Counter Defense, specifically the 3...Qa5 the Mieses variation and modern chess engines rate the line as (.56) +/- favoring White. I started playing it long before I came into contact with a modern chess engine.  To me... the tempo wasting ...Qxd5, & the ...Qa5 have always been a "luring" of White into a familiar position(s)  (the cost.... is the tempos lost) and usually, barring any bad moves,  Black catches up with White in the development late in the opening stage... and get Black gets a decent game.   

The (.56) x/- never factors like I've always assumed.  What factors,  is knowing the pawn structure, and how one should be playing the position.

Personally (and yes I understand it's partially my skill level) It's hard for me to get the "How I should be playing the position" analyzing with a modern chess engine.  I believe in many cases.. It's because the analysis is based on what the Engine would play against someone of similar strength.  

What's Stockfish 17 rating?   3200+  ??   That translates to... a computer that playing at 3200 elo rates the opening line a (.56) +/- against another computer playing at the same strength. 

People generally   look at this optimistically or pessimistically...

Example:  They say ... As (insert the White or Black) against an opponent who is rated 3200 as the best they can do is a half pawn advantage in this line. 

But I'm not 3200elo..  I'm not even 2200elo ...  so in reality this (.56) +/- evaluation means nothing to me.   

Then... On looking at some of the evaluation of popular main lines.   The KID  White has a (.60) +/- or even better.  (yet it's very popular)   Or the Ruy Lopez... in many lines the positions are equal .. yet...  these are also popular. 

IMO I am not 3200.elo. So using a modern chess engine to dictate my understanding of an opening (or...  Basing my choices of openings on the evaluations of said engine) isn't reality and isn't practical.   

Thought? comments? 

 

Aria_Esk09

.

RalphHayward

Using an engine in Correspondence is unpreventable but very bad form indeed, using one in Daily is outright bannable cheating (and the mods' capacity to spot such things seems pretty good based on the various compensatory rating adjustments I've received). But working with an engine in pre-game or post-game position analysis is like trying lines against a blunder-free but uninspired tame GM. Yes, we have to lead the analysis and they take no account of how playable a position will be for humans, but they do stop us from perpetrating arrant stupidity in an objective sense and generally not seeing the wood for the trees.

ThrillerFan
RalphHayward wrote:

Using an engine in Correspondence is unpreventable but very bad form indeed, using one in Daily is outright bannable cheating (and the mods' capacity to spot such things seems pretty good based on the various compensatory rating adjustments I've received). But working with an engine in pre-game or post-game position analysis is like trying lines against a blunder-free but uninspired tame GM. Yes, we have to lead the analysis and they take no account of how playable a position will be for humans, but they do stop us from perpetrating arrant stupidity in an objective sense and generally not seeing the wood for the trees.

You are wrong there. Using an engine in correspondence is not "bad form". It is only chess.com and USCF Correspondence that bans engines, which can often lead to some questionable rulings.

The main place for Correspondence Chess (daily chess is not Correspondence chess, you have to play both to really see the difference - daily is like baby correspondence), which is the International Correspondence Chess Federation, or ICCF, where I spent most of my COVID time rather than the joke blitz Chess is here, if you don't use an engine, plain and simple, you lose!

As far as the OP's post, he is right on, and very few players under 2000 or under 30 years in age understand this.

Engines are phenominal at finding deep tactics. From the forced 14 move sequence that wins a pawn at no cost to the sacrifice of the Bishop that leads to either mate, the winning back of material, or the far superior position with say, an unstoppable pawn, 18 moves down the road.

That is the strength of a computer. In Correspondence chess, these are the types of moves you have to trust. You don't just say "Oh, it says Bishop takes pawn on d5, Let's play it. You have the computer figure out the best moves subsequently for both sides. Don't just play the line it says at 21.Bxd5 to move 38. Give it a few minutes to figure out the best move for each. It might very well flip the assessment when it realizes a miracle draw line for Black at move 33. But if it does not find that (probably won't 9 times out of 10), only then do you play 21.Bxd5.

But when it comes to Openings, it is TERRIBLE at assessing positions. I mean, opening blunders it is fine at. Like 1.e4 e5 2.Bc4 Nc6 3.Qh5, it knows that 3...Nf6 loses. But to say the Scandinavian is +0.56 and the French classical is +0.68 and the French Winawer is +0.42 or whatever is hogwash. I see so many posts here and teenagers at tournaments saying things like "1...e5 is a mistake - Stockfish says the Sicilian is better by 0.24 points." That is complete BS.

Computers are also horrible at endgames. You put a computer by itself in an endgame with KRN vs KR, it will tell you it us +3 unless one side can win material immediately.

Now-a-days, computers have a built in 7-piece tablebase, and so all computers now have to figure out is how to get to 7 pieces on the board, not get down to mate. It now knows the quickest way to win KBN VS K not because of any calculation skills. It is because of the endgame tablebases. That is also why suddenly, depending on the position, with maybe 8 to 11 pieces on the board, it will instantly spit out all zeroes because it has figured out all possible responses to a certain move lead to either a tablebase draw or worse for the opposing side, but that you move does not force a win with best play by him.

So over the last 40 years, computers have resolved the endgame problem, and have always been tactically strong, but position assessment, especially in the opening, is weak as are some, but not all, aspects of positional play.

In the opening, anything less than +1 should be taken with a grain of salt. Later on, if one move is +2.5 and another is +0.19, the +2.5 move is better assuming it is White to move.

If the difference between 31.Nf3 and 31.Bh4 is +0.35 vs +0.27, the knight move is not automatically better, and this is the biggest mistake amateurs make, thinking these numbers are gospel. They are not.

Also, in addition to the fact that Bh4 may actually be better than Nf3 a number of moves down the road, there is also the human element. White is winning. One move is +7.6 with 4 sacrifices and Black can check you 26 times before he runs out of checks, or you can force a massive trade down that leads to Rook and 5 pawns vs Rook and 3 pawns with Black's Rook and king both in. Passive positions, but it is only +3.9. Give me the endgame. It is far easier for the human to execute. It still wins, and so the. Massive sacrifice is not "better."

Hope this helps.

RalphHayward

@ThrillerFan I stand corrected on current Correspondence etiquette. Thank you for taking the time and trouble to put me straight, and for all of your insights. I myself am rather out of touch with the world of 'proper' correspondence chess as it is today, having been out-of-chess for many years. Using computer assistance used to be howlingly bad form but I'm actually rather glad to discover that things have moved on in that respect.

ThrillerFan
RalphHayward wrote:

@ThrillerFan I stand corrected on current Correspondence etiquette. Thank you for taking the time and trouble to put me straight, and for all of your insights. I myself am rather out of touch with the world of 'proper' correspondence chess as it is today, having been out-of-chess for many years. Using computer assistance used to be howlingly bad form but I'm actually rather glad to discover that things have moved on in that respect.

Yep, which is why I say chess.com and USCF Correspondence are archaic. Sure, if you use a computer on every move, people can tell. I had a USCF Correspondence game in the early 2010s against an 1100. It was a Sokolsky. I was white. Every move he is talking about how he is underrated. He claims he drew 2 2000s and beat a 1700 over the weekend. Little did he know that I could look up his tournament history and see that he really had 2 draws and 2 losses against nobody over 1350.

Well, he plays 19...O-O-O (yes, that move and move number. I remember that much. It is the only time I have ever filed a computer usage complaint. All he did was give a stern reminder about not using computers. The damage was done. I got a draw in 47 moves against an 1100. I put the position in the computer after White's 19th move and low and behold, 19...O-O-O was about -6. 28 moves later, I draw.

I am sure I have been (forbidden chess.com word) upon in other games, but they were not obvious like this one.

They won't catch you blunder-checking every move. They will catch you taking best moves every time. I am 100% sure I have faced numerous blunder-checkers in USCF.

RalphHayward

@ThrillerFan I am sorry to hear that you were preyed upon by that utter blighter.

Personally, I don't see the point of using external tools to feed one one's moves: for me, the whole point is a dance of two minds (I prefer Botvinnik's view of chess; as the art of making logic beautiful; over views which see it as a battle) and with external assistance one is not creating one's own dance. But I guess it's like those ghastly people who buy essays or have them written by someone else - if they're just "in it for the result" it probably makes sense to them. Which to my mind leaves them in a position in which they are as much to be pitied as blamed (but definitely blamed too).

ThrillerFan
RalphHayward wrote:

@ThrillerFan I am sorry to hear that you were preyed upon by that utter blighter.

Personally, I don't see the point of using external tools to feed one one's moves: for me, the whole point is a dance of two minds (I prefer Botvinnik's view of chess; as the art of making logic beautiful; over views which see it as a battle) and with external assistance one is not creating one's own dance. But I guess it's like those ghastly people who buy essays or have them written by someone else - if they're just "in it for the result" it probably makes sense to them. Which to my mind leaves them in a position in which they are as much to be pitied as blamed (but definitely blamed too).

In reality, correspondence chess, TRUE correspondence chess, is a different beast than over the board. It is meant to be computer assisted. How do I know 26 moves of Winawer Poisoned Pawn theory? Correspondence Chess. How am I expanding my Petroff knowledge? Finding typical pitfalls and maneuvers? Correspondence Chess.

Now you might think all games would be a draw. It is not. The draw ratio is high, yes. In roughly 650 games on ICCF, I have 88 wins. An official stat. They don't list your number of losses, but if I counted them, it is probably between 70 and 80. So decisive games occur roughly 20 to 25 percent of the time, though I have had tournaments with 12 draws before.

mikewier

The conversation seems to have drifted away from the original question.

The OP is a club-level player and is playing others at his level. If he is comfortable playing the positions that arise from his opening, even if the engine rates it as +.56, I say go ahead and play it.

There is a value to having familiarity with the pawn structures, tactical opportunities, and endings that occur commonly from an opening. The time saved due to familiarity may compensate for the .56 disadvantage. At the club level, few players are able to convert a 56 advantage out of an opening to a win.

Re the discussion of engines and postal chess, I used to play in USCF postal tournaments, in which engines are NOT allowed. I quit postal chess when I became convinced that some players were using engines.

MARattigan
ThrillerFan wrote:
RalphHayward wrote:
 

i'd quibble with some of the things regarding engines in this post.

...

But when it comes to Openings, it is TERRIBLE at assessing positions. I mean, opening blunders it is fine at. Like e5 2.Bc4 Nc6 3.Qh5, it knows that 3...Nf6 loses. But to say the Scandinavian is + and the French classical is + and the French Winawer is + or whatever is hogwash. I see so many posts here and teenagers at tournaments saying things like "1...e5 is a mistake - Stockfish says the Sicilian is better by points." That is complete BS.

Yes it is BS, but that's no fault of Stockfish. It doesn't anywhere document that its evaluations can be used in that way. The BS is entirely on the part of the user. And if it says the Scandinavian is + and the French classical is + and the French Winawer is + that's not hogwash. It's correct; those are it's evaluations (except the chess.co editor has kindly removed them). If you want to read something into them that's not there that's no fault of Stockfish.

When you say its assessments are TERRIBLE, if you're talking about Stockfish - really? If you play it on equal terms how often do you come out of the opening in a significantly better position. 

Computers are also horrible at endgames. You put a computer by itself in an endgame with KRN vs KR, it will tell you it us +3 unless one side can win material immediately.

I disagree with both of those statements. Taking them in reverse order here are assessments of two KRN vs KR positions and a KRN vs KR position (all White to play with a ply count of 0) by Stockfish 17 sans tablebase.

The additions vary between +8.51 and -5.07 at depth 30. I'd guess it would declare mate in well under an hour in all three positions.

And horrible is as horrible does. Let's consider a few basic endgames.

First KQvKNN. If we go back to the end of the second world war, the standard reference work was Fine's Basic Chess Endings. It has this to say:

Starting with White's third move the tablebase gives mate in 31, but Fine, leading endgame authority at the time and one time world champion, couldn't see it. Here is Stockfish 17 sans tablebase playing the position against Rybka with the Nalimov tablebase putting up a perfectly accurate defence, with a relatively fast time control of 40 moves in 15 minutes.


Just two moves slipped on the mate.

The tablebase says that the side with the queen wins about three quarters of the position. 

Müller and Lamprecht which is possibly the standard current work on practical endings (not a reference) assesses the endgame as generally drawn (with some caveats) on the basis of results taken from a large database of recorded games. This suggests that humans can't play the endgame in general.

So how does that compare with Stockfish? Versions up to SF14 can manage mates up to 45 moves deep against the Nalimov tablebase compared with an average mate depth of 16 that probably applies to games in M&L's database.

So Stockfish is not horrible compared with humans in that endgame. I think much the same applies to KQvKBN and KQvKBB.

Now consider KBNvK. There used to be a collection of master games that finished up in this ending on chessgames.com. Sixteen games, all appearing in winning positions, all players 2500+ (except one just under). Two draws! Later the famous fail by the women's world champion got added. 

The accuracy of all SF versions up to 15.1 (and possibly beyond, I haven't checked) in the endgame is amazingly bad, but nothing like bad enough to get even a single fail in 16 games except possibly once in a blue moon (and there are engines much more accurate than Stockfish in the endgame).

So Stockfish is not horrible compared with humans in that endgame either.

Next KNNvKP. That is an endgame that grandmasters routinely screw up. I'll take two examples.

Firstly Topolov v Karpov 2000 (Topolov world no. 1 at the time and Karpov widely regarded as the greatest endgame player ever) reached a drawn position in the endgame and between them blundered three half points in the first 9 moves.

Secondly this. Gurevich is a strong grandmaster who went on to win the tournament, but he offered a draw in a mate in 15 position (and the author of the article didn't notice there was a win either).

Here is SF playing the position against Rybka/Nalimov at 40/15.


Perfectly accurate.

So Stockfish is not horrible compared with those humans in that endgame either.

Admittedly it leaves something to be desired. If a human has studied the endgame enough it will be outplayed. I've spent a quite lunatic amount of time studying the white wins, so I can comprehensively outplay versions of SF up to 14 (I haven't checked later versions) but if the ending resolves to KQvKNN I'm pretty certain the reverse would be true. All versions from 8 to 14 will reliably play any endgame up to depth at least almost 30 moves against Syzygy. And again there are engines that are more accurate than SF in this endgame.

I believe when it comes to harder endgames with more pieces on the board SF will compare even more favourably with humans. The performance of both will deteriorate, but humans faster than SF. 
 

...

In the opening, anything less than +1 should be taken with a grain of salt. Later on, if one move is +2.5 and another is +0.19, the +2.5 move is better assuming it is White to move.

I think that's only usually the case and at a human or engine level of play. You can't necessarily assume a relatively high number means a win. In this SF16vSF16 game (White to mate in 52), for example, White hat SF16 evaluates the position as +88.50 on move 17, but manages only a draw. On the previous ply black hat evaluates the position as 1.42 in favour of White. Both positions are drawn.

...

darkunorthodox88
ThrillerFan wrote:
RalphHayward wrote:

Using an engine in Correspondence is unpreventable but very bad form indeed, using one in Daily is outright bannable cheating (and the mods' capacity to spot such things seems pretty good based on the various compensatory rating adjustments I've received). But working with an engine in pre-game or post-game position analysis is like trying lines against a blunder-free but uninspired tame GM. Yes, we have to lead the analysis and they take no account of how playable a position will be for humans, but they do stop us from perpetrating arrant stupidity in an objective sense and generally not seeing the wood for the trees.

You are wrong there. Using an engine in correspondence is not "bad form". It is only chess.com and USCF Correspondence that bans engines, which can often lead to some questionable rulings.

The main place for Correspondence Chess (daily chess is not Correspondence chess, you have to play both to really see the difference - daily is like baby correspondence), which is the International Correspondence Chess Federation, or ICCF, where I spent most of my COVID time rather than the joke blitz Chess is here, if you don't use an engine, plain and simple, you lose!

As far as the OP's post, he is right on, and very few players under 2000 or under 30 years in age understand this.

Engines are phenominal at finding deep tactics. From the forced 14 move sequence that wins a pawn at no cost to the sacrifice of the Bishop that leads to either mate, the winning back of material, or the far superior position with say, an unstoppable pawn, 18 moves down the road.

That is the strength of a computer. In Correspondence chess, these are the types of moves you have to trust. You don't just say "Oh, it says Bishop takes pawn on d5, Let's play it. You have the computer figure out the best moves subsequently for both sides. Don't just play the line it says at 21.Bxd5 to move 38. Give it a few minutes to figure out the best move for each. It might very well flip the assessment when it realizes a miracle draw line for Black at move 33. But if it does not find that (probably won't 9 times out of 10), only then do you play 21.Bxd5.

But when it comes to Openings, it is TERRIBLE at assessing positions. I mean, opening blunders it is fine at. Like 1.e4 e5 2.Bc4 Nc6 3.Qh5, it knows that 3...Nf6 loses. But to say the Scandinavian is +0.56 and the French classical is +0.68 and the French Winawer is +0.42 or whatever is hogwash. I see so many posts here and teenagers at tournaments saying things like "1...e5 is a mistake - Stockfish says the Sicilian is better by 0.24 points." That is complete BS.

Computers are also horrible at endgames. You put a computer by itself in an endgame with KRN vs KR, it will tell you it us +3 unless one side can win material immediately.

Now-a-days, computers have a built in 7-piece tablebase, and so all computers now have to figure out is how to get to 7 pieces on the board, not get down to mate. It now knows the quickest way to win KBN VS K not because of any calculation skills. It is because of the endgame tablebases. That is also why suddenly, depending on the position, with maybe 8 to 11 pieces on the board, it will instantly spit out all zeroes because it has figured out all possible responses to a certain move lead to either a tablebase draw or worse for the opposing side, but that you move does not force a win with best play by him.

So over the last 40 years, computers have resolved the endgame problem, and have always been tactically strong, but position assessment, especially in the opening, is weak as are some, but not all, aspects of positional play.

In the opening, anything less than +1 should be taken with a grain of salt. Later on, if one move is +2.5 and another is +0.19, the +2.5 move is better assuming it is White to move.

If the difference between 31.Nf3 and 31.Bh4 is +0.35 vs +0.27, the knight move is not automatically better, and this is the biggest mistake amateurs make, thinking these numbers are gospel. They are not.

Also, in addition to the fact that Bh4 may actually be better than Nf3 a number of moves down the road, there is also the human element. White is winning. One move is +7.6 with 4 sacrifices and Black can check you 26 times before he runs out of checks, or you can force a massive trade down that leads to Rook and 5 pawns vs Rook and 3 pawns with Black's Rook and king both in. Passive positions, but it is only +3.9. Give me the endgame. It is far easier for the human to execute. It still wins, and so the. Massive sacrifice is not "better."

Hope this helps.

if engines were hogwash at openings, they woudnt have revolutioned our entire opening books the last 2 decades. Now if someone publishes a repertoire book that isnt engine checked, the book is almost worthless.
you are confusing people not knowing how to use engine evaluations with engines not being good at evaluating openings. IF you give the engine enough depth and manually play out the positions to verify its initial findings and correctly use database statistics to supplement its findings and never forget the human touch of how easily playable a position is regardless of eval, then engines are a fantastic opening resource.

Uhohspaghettio1

Engines are worth something. There are two mistaken types of people: people who think engines are EVERYTHING and people who think engines are NOTHING.

Even yesterday I'm sort of embarrassed to admit I learned a surprising lot about the whole Qh4+ type of attack and how white should play against it, about how to deal with black's Bb4 along with Ne4 attack on the Nc3 in those types of positions. For example a3 wouldn't have really occurred to me after Nxc3 but it's by far the best computer move. Neither would Qg4. And after Nxg3, Qf2 is the only serious move. (The problem with me is that I see things and then forget them completely, so I'm really trying to make sure to review such things many times to get them to stick.)

All this I learned from the computer but they are definitely reasonable and common moves that a human can understand and learn from. You could for sure learn this the old-fashioned way by going through games. However that would just take longer and you might be missing bits of information like "what if I go here?". Yes you could work it out on your own, but that takes time. Sure there are benefits to figuring things out on your own as well, but it's hard to believe that computers don't play some good role, especially as the top players all consider them invaluable.

Unfortunately it may be a case of a "rich get richer" scenario where the good players know how to use computers to their benefit, while the poor players misuse them and end up going down all the wrong paths, getting all the wrong ideas, waste their time and end up handicapping their development when they could be learning the fundamentals better.

darkunorthodox88
Uhohspaghettio1 wrote:

Engines are worth something. There are two mistaken types of people: people who think engines are EVERYTHING and people who think engines are NOTHING.

Even yesterday I'm sort of embarrassed to admit I learned a surprising lot about the whole Qh4+ type of attack and how white should play against it, about how to deal with black's Bb4 along with Ne4 attack on the Nc3 in those types of positions. For example a3 wouldn't have really occurred to me after Nxc3 but it's by far the best computer move. Neither would Qg4. And after Nxg3, Qf2 is the only serious move. (The problem with me is that I see things and then forget them completely, so I'm really trying to make sure to review such things many times to get them to stick.)

All this I learned from the computer but they are definitely reasonable and common moves that a human can understand and learn from. You could for sure learn this the old-fashioned way by going through games. However that would just take longer and you might be missing bits of information like "what if I go here?". Yes you could work it out on your own, but that takes time. Sure there are benefits to figuring things out on your own as well, but it's hard to believe that computers don't play some good role, especially as the top players all consider them invaluable.

Unfortunately it may be a case of a "rich get richer" scenario where the good players know how to use computers to their benefit, while the poor players misuse them and end up going down all the wrong paths, getting all the wrong ideas, waste their time and end up handicapping their development when they could be learning the fundamentals better.

lol apology accepted tongue.png

ThrillerFan
MARattigan wrote:
ThrillerFan wrote:
RalphHayward wrote:
 

i'd quibble with some of the things regarding engines in this post.

...

But when it comes to Openings, it is TERRIBLE at assessing positions. I mean, opening blunders it is fine at. Like e5 2.Bc4 Nc6 3.Qh5, it knows that 3...Nf6 loses. But to say the Scandinavian is + and the French classical is + and the French Winawer is + or whatever is hogwash. I see so many posts here and teenagers at tournaments saying things like "1...e5 is a mistake - Stockfish says the Sicilian is better by points." That is complete BS.

Yes it is BS, but that's no fault of Stockfish. It doesn't anywhere document that its evaluations can be used in that way. The BS is entirely on the part of the user. And if it says the Scandinavian is + and the French classical is + and the French Winawer is + that's not hogwash. It's correct; those are it's evaluations (except the chess.co editor has kindly removed them). If you want to read something into them that's not there that's no fault of Stockfish.

When you say its assessments are TERRIBLE, if you're talking about Stockfish - really? If you play it on equal terms how often do you come out of the opening in a significantly better position. 

Computers are also horrible at endgames. You put a computer by itself in an endgame with KRN vs KR, it will tell you it us +3 unless one side can win material immediately.

I disagree with both of those statements. Taking them in reverse order here are assessments of two KRN vs KR positions and a KRN vs KR position (all White to play with a ply count of 0) by Stockfish 17 sans tablebase.

The additions vary between +8.51 and -5.07 at depth 30. I'd guess it would declare mate in well under an hour in all three positions.

And horrible is as horrible does. Let's consider a few basic endgames.

First KQvKNN. If we go back to the end of the second world war, the standard reference work was Fine's Basic Chess Endings. It has this to say:

Starting with White's third move the tablebase gives mate in 31, but Fine, leading endgame authority at the time and one time world champion, couldn't see it. Here is Stockfish 17 sans tablebase playing the position against Rybka with the Nalimov tablebase putting up a perfectly accurate defence, with a relatively fast time control of 40 moves in 15 minutes.


Just two moves slipped on the mate.

The tablebase says that the side with the queen wins about three quarters of the position. 

Müller and Lamprecht which is possibly the standard current work on practical endings (not a reference) assesses the endgame as generally drawn (with some caveats) on the basis of results taken from a large database of recorded games. This suggests that humans can't play the endgame in general.

So how does that compare with Stockfish? Versions up to SF14 can manage mates up to 45 moves deep against the Nalimov tablebase compared with an average mate depth of 16 that probably applies to games in M&L's database.

So Stockfish is not horrible compared with humans in that endgame. I think much the same applies to KQvKBN and KQvKBB.

Now consider KBNvK. There used to be a collection of master games that finished up in this ending on chessgames.com. Sixteen games, all appearing in winning positions, all players 2500+ (except one just under). Two draws! Later the famous fail by the women's world champion got added. 

The accuracy of all SF versions up to 15.1 (and possibly beyond, I haven't checked) in the endgame is amazingly bad, but nothing like bad enough to get even a single fail in 16 games except possibly once in a blue moon (and there are engines much more accurate than Stockfish in the endgame).

So Stockfish is not horrible compared with humans in that endgame either.

Next KNNvKP. That is an endgame that grandmasters routinely screw up. I'll take two examples.

Firstly Topolov v Karpov 2000 (Topolov world no. 1 at the time and Karpov widely regarded as the greatest endgame player ever) reached a drawn position in the endgame and between them blundered three half points in the first 9 moves.

Secondly this. Gurevich is a strong grandmaster who went on to win the tournament, but he offered a draw in a mate in 15 position (and the author of the article didn't notice there was a win either).

Here is SF playing the position against Rybka/Nalimov at 40/15.


Perfectly accurate.

So Stockfish is not horrible compared with those humans in that endgame either.

Admittedly it leaves something to be desired. If a human has studied the endgame enough it will be outplayed. I've spent a quite lunatic amount of time studying the white wins, so I can comprehensively outplay versions of SF up to 14 (I haven't checked later versions) but if the ending resolves to KQvKNN I'm pretty certain the reverse would be true. All versions from 8 to 14 will reliably play any endgame up to depth at least almost 30 moves against Syzygy. And again there are engines that are more accurate than SF in this endgame.

I believe when it comes to harder endgames with more pieces on the board SF will compare even more favourably with humans. The performance of both will deteriorate, but humans faster than SF. 
 

...

In the opening, anything less than +1 should be taken with a grain of salt. Later on, if one move is +2.5 and another is +0.19, the +2.5 move is better assuming it is White to move.

I think that's only usually the case and at a human or engine level of play. You can't necessarily assume a relatively high number means a win. In this SF16vSF16 game (White to mate in 52), for example, White hat SF16 evaluates the position as +88.50 on move 17, but manages only a draw. On the previous ply black hat evaluates the position as 1.42 in favour of White. Both positions are drawn.

...

How old are you? Are you even old enough to remember the days before tablebases?

Your argument counters nothing. Re-read what I said. Tablebases are what resolved the computer's problems with endgame assessment.

It is not the skill of Stockfish that leads to all zeroes. It is the addition of the tablebases. First 5-piece in the late 90s, then 6-piece, and now 7-piece. The tablebase is not part of Stockfish. It is an add-on that Stockfish uses to properly assess RN vs R or RB vs R or anything else with 7 or fewer pieces. Without that add-on, computers by themselves, as I said before, are horrible at endgame evaluation. That tablebase is like a cookbook to a husband that can't cook. Not part of his intelligence (or artificial intelligence in the case of computers).

Most programs now - Stockfish, Rybka, Fritz, etc, already add those tablebases to the program, but the raw program by itself is still just as bad. Same goes for opening books, they need to be added to the computer, and again, many of those add-ons are added before sold. It's is not part of the computer itself. Think of it as buying a laptop that already has Word and Excel on it. Word and Excel are not part of the laptop. They are add-ons installed before sold. That is what the tablebases and opening books are. Word and Excel.

MARattigan

Old enough to remember the days before chess engines or PCs.

If you read it, all my examples were Stockfish without any access to a tablebase. Neither Stockfish nor Rybka come with a tablebase incorporated into the program, they use one if you give them a path to one as an option. (Don't know about Fritz, don't you have to fork out for it?)

My thesis was that the raw program SF is generally better than humans at endgames, the exceptions being a few of the simplest endgames where a human can excel if they've had a very good look at the endgame.

yetanotheraoc
Uhohspaghettio1 wrote:

Engines are worth something. There are two mistaken types of people: people who think engines are EVERYTHING and people who think engines are NOTHING.

Clever. But I think there are more types of mistaken people than just two.

The engines with NNUE play very differently than the older style alpha-beta engines. They play interesting stuff in the openings and are actually not horrible in the endgame either. But spaghettio's two types of people make the same mistakes regardless of which type of engine they are using (or not using).

Think of an engine as a high-powered laser cutter in your metal working shop. You shouldn't use it for everything, but in the right hands on the right job it can turn out excellent results. In the wrong hands on the wrong job (for example trying to use it on every job) it can turn out a lot of scrap. Now imagine a trade show where you want to display your metal working prowess, but you can't bring the laser cutter with you. Kind of analogous to a chess tournament where you can't use the engine. This changes the whole equation.

Beyond a certain strength, which engines passed some time ago(*), it doesn't really matter how strong an engine is. What matters is whether you are using the engine to improve your in-game decision-making skills for when the engine won't be available. And that's a hard problem to solve, because training the wrong way with an engine can easily make your in-game thinking worse. I see this all the time at the club where instead of analysing their own games they have the engine do it and draw all the wrong conclusions. Wrong not because the engine made a mistake necessarily, but because their attention is focused on the individual move (the "inaccuracy") and not on how their own wrong in-game thinking led them to miss the correct move. "I blundered" is too sweeping a category and not the sort of fine-grained error that can be targeted for improvement.

(*) Caveat: Minimum required engine strength depends on the player strength.

darkunorthodox88

if you put a simple endgame with an engine and it gives you an eval like +3, what you must pay attention to is the steadiness of the eval. IF it is steadily crawling upwards, you can have a pretty good idea that the endgame likely wins. IF you in high depth and the eval remains fairly stable then it means even after many moves it cant see any improvement to the position. Doesnt mean its not winning, sometimes it really is just beyond its search horizon, but more often than not, you can make a pretty good guess if its winning or not at sufficient depth. 
Only a moron sees an engine give an eval that would mean winning in the middlegame and assume its also winning in such endgame position. Were you to have enough brute force for ridiculous depth , a true +3 in the middlegame or even opening is worked out to 1-0 and the eval bar would keep going upward (the corollary to this is that a true winning position implies the existence of a mate in X, where X can often be a very large number of moves ,engines dont need to see that far to know its winning)
sparknotes version if an engine, still thinks the advantage is steady after calculating dozens of moves ahead , it is likely a draw or (an unusual fortress position. ) Even if the eval bar swears you ahead so much in material.
Gotta use common sense people.