A problem with the chess engine?

Sort:
BKPete

When I a review games, I notice the best line recommended by chess.com engine quickly deviates from its own original view of the best line! And this leads to some very odd evaluations of each move.

An example below:

Here, chess.com recommends c5 as shown with an engine evaluation of +2.75.

But in its 3 best lines below its initial recommendation - it quickly moves away from what it thought was best... if black replies Nf6 the evaluation is just +1.59 (and in fact, it thinks the slightly better reply from black should be Ne7 with a rating of +1.54).

This means that what chess.com states as 'best move' or 'mistake' or 'miss' etc. are often not correct - because playing those lines for just one or two moves quickly deviates to a very different evaluation.

I get that there is a limit to how many moves the engine would look ahead and each move that is played the engine gets a deeper insight - but this seems to suggest the engine only looks very few moves ahead in the initial review. If so, its evaluation seems a little pointless / misleading - even its own engine immediately contradicts the initial evaluation.

If these inconsistencies happen only occasionally - I can understand it, but they seem to happen in almost all games I review - I'm rapidly losing faith in the accuracy rating!

Am I missing something?

tygxc

@1

"When I a review games, I notice the best line recommended by chess.com engine quickly deviates from its own original view of the best line!" ++ Yes

"And this leads to some very odd evaluations of each move." ++ Yes

"it quickly moves away from what it thought was best" ++ Deeper look

"the evaluation is just +1.59 and in fact, it thinks the slightly better reply from black should be Ne7 with a rating of +1.54 ++ +1.59 and +1.54 mean basically the same: white is winning.

"what chess.com states as 'best move' or 'mistake' or 'miss' etc. are often not correct" ++ Right

"there is a limit to how many moves the engine would look ahead and each move that is played the engine gets a deeper insight" ++ Yes

"the engine only looks very few moves ahead in the initial review" ++ Yes

"its evaluation seems a little pointless / misleading"
++ It is a matter of resources and time. Chess.com cannot allocate a supercomputer to analyse your game. You do not want to wait 8 hours for a reliable analysis.

BKPete
Thanks for taking the time to reply.

So, if this is normal behaviour i’m thinking that the analysis tool is probably worse than no review at all - because its suggested lines will, at best, be badly sub-optimal; at worse may be worse than my natural play! So its evaluation will be bad learning.

Agree i don’t want to wait 8 hrs for analysis, but i’d much prefer to wait long enough for analysis which adds value, rather than something fast and misleading.

Is there any way to adjust the depth / time that the analysis tool uses?

I imagine for expert players the analysis is much much worse than their own play which would make the tool pointless!
tygxc

@3

"So its evaluation will be bad learning." ++ It is not that bad. It will usually be right. When two moves are about as good it may fluctuate between the two, but then the difference is not important.

"Is there any way to adjust the depth / time that the analysis tool uses?" ++ You can download an engine and run it on your computer with the depth / time you like.

"for expert players the analysis is much much worse than their own play" ++ Yes

Martin_Stahl
BKPete wrote:
Thanks for taking the time to reply.
So, if this is normal behaviour i’m thinking that the analysis tool is probably worse than no review at all - because its suggested lines will, at best, be badly sub-optimal; at worse may be worse than my natural play! So its evaluation will be bad learning.
Agree i don’t want to wait 8 hrs for analysis, but i’d much prefer to wait long enough for analysis which adds value, rather than something fast and misleading.
Is there any way to adjust the depth / time that the analysis tool uses?
I imagine for expert players the analysis is much much worse than their own play which would make the tool pointless!

The engine is going to be much better than almost any player in most instances but a full line given at a single point may not match be the better line as it steps down the variation. While you may understand your natural play better than the engine suggestions, the latter is almost always going to be stronger.

You can change the strength of the Review between Fast, Standard, Deep, and Maximum. The latter is going to be the best and more accurate review here. That said, in some positions, letting the engine sit on a position a lot longer may find better lines, which is something that would need to be done manually and you would need to identify those positions where it might make the most sense.

Martin_Stahl
Azurecloudhart wrote:

engine is still trash at explaining its lines

Engines don't explain, just give moves. The site, with Game Review and explanations is building code to try to translate that into useful and accurate information. They are constantly working on that process so it should improve with time.

BKPete
Martin_Stahl wrote:

You can change the strength of the Review between Fast, Standard, Deep, and Maximum. The latter is going to be the best and more accurate review here. That said, in some positions, letting the engine sit on a position a lot longer may find better lines, which is something that would need to be done manually and you would need to identify those positions where it might make the most sense" - Thanks - great to know - I'll experiment!! (I am doubting that standard strength engine has a rating above 3000 though - as claimed).

BKPete

Just to clarify: there seem to be two engines (or depths) that exist in the review....

- The engine which does the initial review (which includes the accuracy rating) - this seems very weak play.

- But the engine which looks at best alternative lines as you scroll through each move (default 3 lines) looks very strong - I can believe this has a rating > 3000!

But the concern for me is the engine for the initial review - because that classifies 'mistakes', 'misses' etc. as a summary - but this assessment is often very misleading. Controlling the strength of engine for that process would be very useful - I'll experiment if changing the settings influences that engine.

Martin_Stahl

The post game quick review is very weak. The full Game Review is run server side and is based on your settings. Maximum is the best and will provide the best results and uses Stockfish 16 NNUE

The browser based analysis can be set to a number of different options, including Stockfish 16 NNUE.

The method used for Game Review isn't running to a set depth for every position, using a different method where it can go deeper is some cases. So, there are positions where browser analysis can get deeper than the review did but in most cases it shouldn't make a huge difference in the individual move analysis. The Show Lines option is defined at review time, so it is more likely that analysis will find better options as that variation is explored further.

BKPete
Martin_Stahl wrote:

So, there are positions where browser analysis can get deeper than the review did but in most cases it shouldn't make a huge difference in the individual move analysis."

Thanks very much for more detailed explanation - it is making better sense, But,...

I still believe the quick analysis is too weak and in many cases its assessment DOES make a material difference. I will keep my eye on this and look for better examples - what I'm looking for are cases when a move is classed as a 'mistake' or 'miss' and the initial review elevates the position of greater than a +/- 1 point shift, but when you play out the review's 'best move' alternative follow that through with best engine play for 2 or 3 moves the evaluation returns to same level as the 'mistake/miss' - implying there is a relevant fault with the initial review.

If I find good examples like this from new games I review - I'll post here., if I don't - I'll also post update here.

In meantime, I'm interested to know if anyone else has similar experience to me. (I'm 1600 rated player.)

I am only motivated by improving the features of chess.com - the quick analysis is such a good idea - so its very frustrating if it simply doesn't work well enough to be useful.

Martin_Stahl

The issue is that there are some positions where only going deeper is going to allow a better evaluation. The site has to balance the time taken to run a review with the overall strength of the review and won't be able to always capture those more complex positions.

BKPete

Sure - but if those situations crop up in most games it means the quick analysis is almost useless, possibly worse than useless when it presents bad guidance. And there are so many ways to make this work better without draining resources... stepping through games manually consumes the extra resource to look for deeper / better lines in any case (as per my original example) - why not use that insight to update the quick review dynamically? It could be an optional feature if some people don't want to bother.

(I know of one person who told me not to use chess.com engine to review my games - because it's so bad. I'm sure now its only the quick review which is bad - but this guy has already given up on chess.com and uses a different site now. I'd much prefer chess.com took on feedback to improve its features rather than push people away.)

BKPete

Update to my previous post:

I finished one game yesterday. I had one 'miss', no 'mistakes'. The quick review suggested the miss had a better move would have taken the position from -0.08 to -1.29; but on chess.com's own deeper analysis that better move actually just took the position to -0.58.

Not such a bad miss but I'll give benefit of doubt since there certainly was a better move. Perhaps a little misleading, but interesting that this was was the first and only mistake/miss I've reviewed so far.

tygxc

The quick review thinks -1.29 clearly winning, but a deeper analysis reveals hidden defences and an unclear result -0.58, maybe winning, maybe drawing.
Remember all those -1.29, -0.58 etc. are approximations. After unlimited time only draw / win / loss would be the true and certain evaluation.

BKPete
tygxc wrote:

The quick review thinks -1.29 clearly winning, but a deeper analysis reveals hidden defences and an unclear result -0.58, maybe winning, maybe drawing.
Remember all those -1.29, -0.58 etc. are approximations. After unlimited time only draw / win / loss would be the true and certain evaluation."

Yes - but it doesn't need deep analysis to correct this - chess.com corrects itself immediately when looking at the position after just 1/2 a move. This is my point: it seems a simple thing to fix and a wasted opportunity not to!

I finished my next game:

I had one blunder: rated as +2.64 in initial analysis immediately corrected to +1.71 by chess.com. Best move was +0.51 so not a misleading message just an irritating evaluation.

I also had one miss: rated as +3.25 by initial analysis immediately corrected to +2.07 (why?) But best move was -1.70(!) so certainly a miss.

No other mistakes/misses in this game. Perhaps there is a pattern emerging with strange ratings overstating the impact - but both these moves were big errors so forgivable assessment.

I'll keep posting all mistakes/misses - but probably the concern arises in cases where an error in the initial rating by +/- 1 point arises when my move is acceptable even if sub-optimal - as per my first 2 examples... in such circumstances the initial rating would be misleading.

(I realise I'm currently not looking for cases the other way around e.g. when I play a bad move which isn't detected by the initial analysis).