Stockfish 15.1 new evaluation

Sort:
Duckfest

Stockfish just released 15.1 (https://stockfishchess.org/blog/2022/stockfish-15-1/)

One of the paragraphs in the article:

This release also introduces a new convention for the evaluation that is reported by search. An evaluation of +1 is now no longer tied to the value of one pawn, but to the likelihood of winning the game. With a +1 evaluation, Stockfish has now a 50% chance of winning the game against an equally strong opponent. This convention scales down evaluations a bit compared to Stockfish 15 and allows for consistent evaluations in the future.

This low-key change of the engine evaluation surprises me a little, because the conventional interpretation is so widely accepted. What are your thoughts on this?

ubuntux

It's absolutely necessary. They could wait no longer. Finally, they use a sensible evaluation function.  

> the conventional interpretation is so widely accepted

The problem is, that the conventional interpretation does not even come remotely close to what the centipawns actually mean today. The new evaluation function is much easier and straightforward to interpret. 

Back in the very old days, 100 centi-pawns were meant to be approximately equal to one pawn of material advantage. These times are long gone. 

 

 

Duckfest

Thank you for your reply. I was surprised this change had no impact on the community, but I guess it's part of a longer trend I wasn't aware of. 

When I switched to 15.1, I noticed the difference immediately. Positions that I called +6 in the morning changed to +4 the same day with the new engine.  These new differences were much larger than the difference between 14 and 15. 

Part of my confusion is about the reversal of evaluation and performance.  The new definition With a +1 evaluation, Stockfish has now a 50% chance of winning the game against an equally strong opponent. has consequences. Under the old definition, Stockfish would eventually become strong enough to have a 50% winning change in a +0.5 position. With the new definition, that same +0.5 position would now be called a +1 position, because it's tied to Stockfish's ability to have a 50% winning chance.

It's no longer Stockfish that gets better given a certain evaluation, it's the evaluation that's gets worse as Stockfish gets better. With the obvious downside that every evaluation written down in text becomes obsolete. Every game annotation where I mention a +x evaluation is no longer valid.  Doesn't this also mean that the definitions of blunders, mistakes and inaccuracies will change down the line? Eventually, when Stockfish will win a game after a single inaccuracy, that inaccuracy will be a blunder?

Based on what you said it's a process that's going on for a while. Can you elaborate on these two things you said:

the conventional interpretation does not even come remotely close to what the centipawns actually mean today

What does it mean today?

The new evaluation function is much easier and straightforward to interpret. 

How exactly?

 

 

 

Tony121145

With this new 15.1 evaluation I have noticed that whereas before in an engine match using SF 14/15 against all other engines using the Silversuite X 50 preset openings with #22: Sicilian Dragon Yugoslav Attack, SF as white would start playing out of book with a big +1/+1.2 and more advantage and win every game fairly easily. With 15.1 the initial evaluation is around +0.56 and it does not press the attack so strongly as 15. It still wins with only the occasional draw against against LC0 v.0.29 with Nvidia RTX 3080 graphics card, but it takes longer against other top engines using that particular opening which seems to give White rather a large and unfair advantage straight out of book. Anyone know how much stronger SF 15.1 is over SF15?  I test mainly at Rapid or longer time controls using an 8 core PC with 1024 mb hash and 8 threads.

magipi
Duckfest wrote:

The new evaluation function is much easier and straightforward to interpret. 

How exactly?

 

That is the million dollar question. The article talks about what +1.0 mean, but it is silent about 2.0 or 6.0 or 26.0. We can only guess - except I can't even imagine what a reasonable guess would be. Maybe +26 means that the stronger side wins 26 out of 27? Maybe. And how does that translate to the old evaluation, aside from the extremely vague remark "This convention scales down evaluations a bit compared to Stockfish 15".

Tony121145

Yes it's unclear what the change to the evaluation function in 15.1 actually means.  I'll repeat some games 15 only managed to draw against various top engines to see if 15.1 manages to find a few wins, bearing in mind the large number of draws usually involved with Dragon 3.2 and LC0 0.29. 

For anyone interested try it with this endgame study. SF 15 found the solution immediately but took a while to find the mate.

White: Kb1, Rc2 and Rh5, Ph4   Black: Kb7, Qa7, Pb6 and e7 ... White to move and win. SPOILER ALERT solution below.

 

 

 

Solution: The winning move is 1. Rhc5!! when capturing with 1 ... bxc5 is answered by 2. Ra2! and the black queens only escape squares allow a pin or a skewer with 3. Rb2. And if black captures the second rook, the h4 pawn will queen and win. Black has other possibilities on his first turn, but all result in the loss of the queen.

I don't know the source of the study but I got it published in Chess magazine about 3 years back by John Saunders who couldn't find the source either. A very elegant puzzle and extremely tricky for humans as the first move is far from obvious and I don't know anyone who has found it when shown. Not so for SF though, but even this silicon giant takes a few minutes to find the #22!

 

 

Tony121145

SF 15.1 found  some wins in games against top engines with the same openings and time limits, usually rapid 30'+8" and same engine parameters which SF15 only drew, so a definite improvement. 

Hope someone found the study/puzzle interesting.