Inconsistent analyses

Sort:
itinsecurity

Why are the game reports so very varying? If I open a game report for the same game several times, I can get very different results every time. I’m guessing this is related to computing time or computing power somehow, but analysing the same game on the same device with the same settings gives a different result every time. And the differences are sometimes quite large.

WBillH

The computer will update its thinking as it spends time on the analysis.

Anyone that knows how chess.com has their engines and compute resources configured isn't making public statements about it.  So we can just make educated guesses.

In theory, if the engine reaches the same level of depth, two different analyses will be identical for that engine.

I gave up using the chess.com analysis for anything but a quick way of seeing the interesting points in a game.  I use an engine running on my computer for more complete study.  I'll often load a game before I go to bed, and set it to run at a fairly deep level, and when I wake up the next morning, it's ready.  It's humbling to see what it thinks of my games vs what the chess.com quick analyses show.  The longer the computer runs, the more I suck.  grin.png 

itinsecurity
Yeah, I know that feeling :)

I just found it odd that for the same game now, it listed 1, 2 or 3 blunders for each of us two players, different combinations each time. Also different number of best moves, interesting points etc. The only thing that was consistent was the graph.

And in the analyses it said things like “e6 is an inaccuracy. Nc6 is best”, but try that and it says “Nc6 is good. e6 is best”.
🤦‍♂️
genacgenac

Sometimes a white move will label Mistake +2.0 with alternative Good move +1.0.  What gives?

MitchDerise

Same game, same device, same level of analysis 5 minutes apart. Completely different recommendation for best move.Game 1Game 2

magipi

When the difference is tiny (like here), it is pointless to label one of the moves as "best move".

More importantly, "Game review" is always nonsense because the depth of the analysis is so shallow. Switch to the "Analysis" tab and give the engine a few seconds (at least) to think it through.

genacgenac

Larger question:  if a human can only identify depth < 5 -- GM are not humans -- isn't a more shallow analysis more useful?  Is the next shoe to drop some AI that factors what your human opponent is most likely to do?  Pro tip:  open the flap on the back of your opponent's skull to confirm wet matter and no wiring harness.

Duckfest

Engine evaluations vary each time. I don't know the exact reason why this happens, but if I had to guess I would say resources.  This doesn't only happen on chess.com. Most of the time I use my local instance of Stockfish to analyze positions and I still get the same variance in results.

Move evaluation will be different each time, but usually not by much.  The evaluation will be accurate withing a +/- 0.30 margin. That's why the post-game analysis game be misleading.

Take this position for example.

Considering the variance in the exact evaluation of the position, none of these moves can be considered 'best move' without taking other factors into account. It's a matter of personal preference which move you choose, but if you play any of these moves you'll be fine.  

Part of the problem is visual representation of the engine evaluation with a two decimal accuracy. In the position I posted Nf3 is the second best move and c4 is third, because of a 0.01 difference. That difference is so small it is totally meaningless, even at GM level. If the output would be shown in a bar chart you would not be able to tell which move would be better, which paradoxically would make it a more accurate representation.

@genacgenac. "isn't a more shallow analysis more useful?". I've been wondering about that myself. If I analyze a position at depth 40 and follow the best lines to see the positions it leads to, it starts to deviate completely from the actual game withing the first few moves. Essentially, Stockfish evaluation of each move is based on future positions that have nothing to do with the game that is actually happening. That's why I started experimenting with lower depths, because I thought that positions that can occur three or five moves from now are way more relevant than positions that can occur 40 moves from now according to Stockfish.

The reason why I decided after all that higher depth evaluations are better is because the lower depths aren't good at determining the value of structural weaknesses and inevitable outcomes. There are positions that are +3 at depth 10, rising to + 5 at depth 18 and are +8 by the time the engine reaches depth 30 (for example).  I don't have an example at hand, but think of a late middle game position with lots of tactical opportunities in the short run, but after most pieces are traded off, one of the players has a better pawn structure that can lead to a win. Anything can happen in the next 10 turns, but once the dust settles one player has a winning position and the other doesn't. I prefer to see the Stockfish evaluation based on the long term prediction.

@genacgenac. "what your human opponent is most likely to do?". Opening the skull of your opponent might work OTB, I've yet to do so in online games. That's why I use openingtree.com, because I find it just as valuable to know what my opponent will do as what Stockfish would do.