Lucas Chess Program Elo Accuracy

Sort:
Chess123aa

I've been using a chess game analyzer known as Lucas Chess to analyze some of my chess games. Some of the well-played games that I have are analyzed to have a playing elo of approximately 2000-2300. My opponents' elos during these games averaged around 1500-1700, faltering while playing against me. Although I find my play during these games to be very accurate, my actual rating is somewhere between 1500-1700 FIDE based on personal gauging of my skill level. Could anyone else who has had experience with using Lucas Chess apprise me on how well those analyses correspond with their true rating? 

Note: My Chess.com rating is about a 1200, however, I've been stuck in some sort of "elo-trap" where I'm pitted with players who have tons of experience, but are also underrated. I am not sure if this phenomenon is common or not.

p8q

You are not the only one who noticed the 1200 barrier. But we can't talk about that here, if you want to talk about it you can join this forum:

 

I'm trying to investigate how accurate is Lucas Chess Elo performance. So far in all my experiments it's quite accurate when analyzing OTB humans and offline chess engines at different elo settings (Rodent IV, Shredder, Chessmaster 11, and many CCRL list engines). Usually it is +-100 points accurate, sometimes i noticed +-200. Seldomly it goes off by more than 200 points.

Take into account that humans sometimes perform 500 points lower or higher than their actual rating, they are more unstable than engines (sometimes they play tired, worried, inspired, etc.). But as an average it matches with human OTB rating.

For engines, Lucas Chess elo performance is more accurate, since engines always perform at same strength (give or take 100 points).

All that is according to my experience and experiments. Someone that could prove me wrong is wellcome.

MGleason

I checked a few of your games.  Sorry to disappoint you, but I don't think you're anywhere near 1500-1700 FIDE.  Your 1200 rating looks pretty accurate for your level of play.  That doesn't mean you can't get there, but you're not there yet.

You keep dropping pieces and missing tactical opportunities that your opponent gives you.  I would recommend tactics puzzles and Puzzle Rush training.  That will help you to spot more tactical opportunities and avoid giving your opponent tactical opportunities.  Cut those blunders down and your rating will climb hundreds of points.

Lucas Chess rating performances simply tells you how closely you matched the engine.  It doesn't tell you how difficult those moves were to find.  So a simple drawn endgame where it's very easy to find moves that don't change the evaluation will result in a Lucas Chess rating performance well over 3000, which is obviously ridiculous.

If all you're looking for is a metric that tells you how much you blundered, it's OK.  But if you want something that actually tells you how good you are, it's a pretty poor predictor of that.

MGleason

BTW, someone else investigated Lucas Chess's Elo performance metric.  It gives 3500 any time you match the top engine move (or play a move that has the same evaluation).  It gives zero for any move that is 100 centipawns (1/100 of a pawn) inferior to the best move.

It doesn't factor in the complexity of the position or how difficult the move is to find.

Very reasonable and strong moves that are not "best" get scored harshly.  Positional errors that don't change the evaluation by a full 100 centipawns but that any coach would point out as a significant error are scored much higher than they deserve.

It's completely useless for anything other than a metric of "how often did I blunder?", but it's not even very good at that.

Sure, if you take an average across a very large number of games, a stronger player will tend to score better than a weaker player.  But the actual number it comes up with is completely meaningless, and scores from individual games are essentially useless.

Immaculate_Slayer
MGleason escreveu:

I checked a few of your games.  Sorry to disappoint you, but I don't think you're anywhere near 1500-1700 FIDE.  Your 1200 rating looks pretty accurate for your level of play.  That doesn't mean you can't get there, but you're not there yet.

You keep dropping pieces and missing tactical opportunities that your opponent gives you.  I would recommend tactics puzzles and Puzzle Rush training.  That will help you to spot more tactical opportunities and avoid giving your opponent tactical opportunities.  Cut those blunders down and your rating will climb hundreds of points.

Lucas Chess rating performances simply tells you how closely you matched the engine.  It doesn't tell you how difficult those moves were to find.  So a simple drawn endgame where it's very easy to find moves that don't change the evaluation will result in a Lucas Chess rating performance well over 3000, which is obviously ridiculous.

If all you're looking for is a metric that tells you how much you blundered, it's OK.  But if you want something that actually tells you how good you are, it's a pretty poor predictor of that.

I think his strength doesn't have much to do with the post and he didn't want advice about that

p8q
Immaculate_Slayer wrote:
 

I think his strength doesn't have much to do with the post and he didn't want advice about that

Exactly happy.png

p8q
MGleason wrote:

I checked a few of your games.  Sorry to disappoint you, but I don't think you're anywhere near 1500-1700 FIDE.  Your 1200 rating looks pretty accurate for your level of play.  That doesn't mean you can't get there, but you're not there yet.

You keep dropping pieces and missing tactical opportunities that your opponent gives you.  I would recommend tactics puzzles and Puzzle Rush training.  That will help you to spot more tactical opportunities and avoid giving your opponent tactical opportunities.  Cut those blunders down and your rating will climb hundreds of points.

Lucas Chess rating performances simply tells you how closely you matched the engine.  It doesn't tell you how difficult those moves were to find.  So a simple drawn endgame where it's very easy to find moves that don't change the evaluation will result in a Lucas Chess rating performance well over 3000, which is obviously ridiculous.

If all you're looking for is a metric that tells you how much you blundered, it's OK.  But if you want something that actually tells you how good you are, it's a pretty poor predictor of that.

Thank you @MGleason for taking the time looking into my games, but i don't feel disappointed, i know my blunders, my weaknesses, and i was aware of them from the beginning.

As i said in previous posts, i know i'm 1200 rated in blitz because in fast time controls i miss simple tactics that make me lose. I don't want to improve that, i don't want to become a good blitz player, i'm slow chess player and that's what i like. If you check out my daily games, you will see i don't need puzzle rush or any kind of blitz training to be good at slow chess (if you check those daily games that i didn't quit because i quit the tournament for not having enought time in that period of time).

My point was.... well, you can re-read all previous posts, i think i made myself clear enough.

I don't think you are right about Lucas Chess rating performance. You can see there are "black" moves (not the top engine matches) that are rated in different values according to complexity and how good the move was. It's true some top engine moves (blue ones) are rated 3500, but that's how it must be, since you made the same move Stockfish 14 suggested, and that's en engine move. So Stockfish 14 game will be rated close to 3500, and human games will be rated lower than that.

But the point is not the exact elo number, but the difference between some players and others. The difference between GMs, 1100 beginners and @xQcow1 (who we know for sure how he plays). According to your conclusion from Lucas Chess analysis elo performance, you are telling me that 1100 beginners play more top engine moves than GMs. As you can see, that doesn't make any sense. That means that 1100 beginners moves are closer to Stockfish 14 than top GMs? strange, because then 1100 would win GMs, since Stockfish 14 wins GMs.

"So a simple drawn endgame where it's very easy to find moves that don't change the evaluation will result in a Lucas Chess rating performance well over 3000, which is obviously ridiculous."

I don't think that's ridiculous at all. That's just the way it is. If you play the same moves Stockfish 14 plays, then your rating performance is the same as Stockfish 14. Why is that ridiculous?

The whole forum i've been talking about middle game elo performance, i didn't talk at all about openning or endgame elo performance, because in those humans will tend to play perfect engine moves by the nature of the position, so i'm not taking that into account. And i also said in previoius posts that's the reason certain players are so difficult to detect in those types of endgames (also opennings) because of the nauture of the positions forces humans to play the same moves as engine, so it's impossible to detect if someone .... in those moves.

Even if we look just at how much we blunder or make mistakes, without taking into account elo performance, then you will see beginners 1100 games are completly different than @xQcow1 (having @xQcow1 a higher rating).

 

p8q
MGleason wrote:

BTW, someone else investigated Lucas Chess's Elo performance metric.  It gives 3500 any time you match the top engine move (or play a move that has the same evaluation).  It gives zero for any move that is 100 centipawns (1/100 of a pawn) inferior to the best move.

It doesn't factor in the complexity of the position or how difficult the move is to find.

Very reasonable and strong moves that are not "best" get scored harshly.  Positional errors that don't change the evaluation by a full 100 centipawns but that any coach would point out as a significant error are scored much higher than they deserve.

It's completely useless for anything other than a metric of "how often did I blunder?", but it's not even very good at that.

Sure, if you take an average across a very large number of games, a stronger player will tend to score better than a weaker player.  But the actual number it comes up with is completely meaningless, and scores from individual games are essentially useless.

I don't know who looked at Lucas Chess elo performance metric, but he's wrong. It's not 3500 and 0 values. There are a lot of in between elo values according to position or how good was the move chosen.

If you make a terrible blunder, it's 0 rating, that's perfect. If you play a perfect move, thats 3500, that's perfect. between those values, there are lots of moves where complexity and different evaluations are taken into account. The only downside is for simple straightforward games, but those i didn't take into accunt to take conclusions. Those are played the same by a GM than a beginner. For example, if you put a queen in front of a pawn and the beginner takes the queen, you (as a human) will not be able to distinguish the rating according to that move and you don't know if the player is a beginner or a GM, because anybody would take that queen. In that case, Lucas Chess (as en engine) will not distinguish either. So i didn't take into account those types of straightforward games.

"Sure, if you take an average across a very large number of games, a stronger player will tend to score better than a weaker player.  But the actual number it comes up with is completely meaningless, and scores from individual games are essentially useless."

Off course, i already said that in previous posts.

"It's completely useless for anything other than a metric of "how often did I blunder?", but it's not even very good at that."

I don't think that's correct, but even if it were correct and you are right, and we don't take into account the elo performance value at all, then we can see that as an average @xQcow1 makes many more blunders and mistakes than the rest of 1100 beginners.

MGleason

@p8q, I was actually responding to @Chess123aa, not you, when I was talking about his level.

BTW, I downloaded LucasChess to try it out.  I'm not impressed by the Elo scores.  Other parts of the software may be valuable - it has a lot of features I didn't explore - but the Elo scores look to be worse than useless - they're so inaccurate that they're completely misleading.

I strongly disagree that a perfect move that is very easy to spot should count as 3500.  Simply matching the engine does not mean that you're playing at a 3500 level.  Playing at a 3500 level does not mean you played the same move as the engine.  It means playing with a level of tactical accuracy and positional understanding comparable to a 3500 player.  There are some positions in which it is simply impossible to demonstrate 3500 strength.

It also appears to weight all positions roughly equally.  This is extremely simplistic.

Accurate play in complex and difficult positions should contribute significantly to a high score, much more so than in easy positions.  Accurate play in a complex position is where a strong player can show their strength.  Inaccurate play in easy positions should contribute significantly to a low score, much more so than inaccurate play in difficult positions where anyone can blunder; inaccurate play in easy positions is where weak players show their true weakness.  LucasChess ignores this and weights all positions equally.

That kind of simplistic evaluation will give wildly skewed accuracy scores.  And you can't even use it very well to compare players.  It will misevaluate the strength of a player who likes to create chaotic positions where both players are likely to blunder is playing poorly.  It will massively overestimate the strength of a player who avoids sharp, high-risk positions and prefers quiet positions where blunders are infrequent.

A long, easy endgame where "perfect" moves are easy to find will get scored as well over 3000, even though a 900-strength player could play that with half his brain.  A highly complex middlegame with multiple blunders will score poorly, even in positions so difficult that even a titled player is likely to blunder.

If you take an average from a very large number of games and compare them, a stronger player will usually score better than a weaker player, but not necessarily if the stronger player likes wild, chaotic positions.  Stylistic differences can influence their scores as much or more than their strength of play.  And in individual one-off games, the results are all over the place and completely meaningless.

LucasChess has other features that might be valuable, but this Elo evaluation is not useful at all.

p8q

If it's possible that beginners chess players can always choose to play easy positions where they can't blunder, then any beginner chess player will do so while playing vs Magnus Carlsen and always get a draw. Actually he will always draw vs the top chess engines of the world all the time, if he keeps that strategy of not complicating himself.

Nakamura himself tries to play not complicated positions vs Komodo, because he knows chess engines are better at tactics than humans. So if Nakamura can't drive the game to simple positions, why beginners are able to do so 90% of their games?

"I strongly disagree that a perfect move that is very easy to spot should count as 3500.  Simply matching the engine does not mean that you're playing at a 3500 level.  Playing at a 3500 level does not mean you played the same move as the engine. "

Ok, i respect your opinion. I don't agree with you in this point, but i respect your personal opinion.

Anyways, i said in previous posts that it doesn't matter the accuracy of the elo performance value in Lucas Chess. What matters are the comparisons. The exact number doesn't matter. I also said just comparing number of mistakes and blunders (not taking into account rating values) already makes you think about it.

"A long, easy endgame where "perfect" moves are easy to find will get scored as well over 3000, even though a 900-strength player could play that with half his brain.  A highly complex middlegame with multiple blunders will score poorly, even in positions so difficult that even a titled player is likely to blunder."

I know that, that's why i never took into account straightforward easy games, i scored 3400 in that type of games and i know i'm not that rating LOL XD So i didn't choose those games.

When beginners play chess, they get into complicated positions all the time, just because chess complicates itself unexpectedly, just one unexpected single move of your opponent can complicate everything and you can't control that, sometimes even a mistake from your oponent will complicate the position to the point that you are the one who loses.

I don't think beginners always play easy chess positons and GMs play always complicated positions. I know sometimes there are games that are straightforward games, with no much complication, but i never took into account those games.

"If you take an average from a very large number of games and compare them, a stronger player will usually score better than a weaker player"

I already did that.

"LucasChess has other features that might be valuable, but this Elo evaluation is not useful at all."

Ok, discard the elo evaluation if you don't like it. Take the number of mistakes and blunders, that could be useful, right? chess.com analysis does that too. Because if that's useful, then my conclusions are exactly the same as taking elo performance values.

Well, i don't think this conversation belongs to this forum, it's coming from other topics and another forum, so i will not keep talking about it here.

Thank you @MGleason for taking your time to answer.

 

MGleason

I'll respond to this one:

"If it's possible that beginners chess players can always choose to play easy positions where they can't blunder, then any beginner chess player will do so while playing vs Magnus Carlsen and always get a draw."

Nope.  A strong player is very good at creating complications where his opponents are likely to go wrong.  A strong player is also good at accumulating little advantages.  A weak player doesn't know how to avoid this.

Weak players playing against each other sometimes fail to do this, and so have games where it never reaches a complex position.  And you've also heard of a "GM draw" where a draw is good enough for both players and neither player wants to take a risk and so they both just go for an easy drawn position and never seek out complications.

But when a strong player wants to try to win, a weak player is not good enough to be able to avoid the complications and difficulties that the strong player wants to create for him.

"Take the number of mistakes and blunders, that could be useful, right?"

Some of the same issues actually apply to that, too.  In a complex and difficult game, both players are likely to make more errors.  In a simple, straightforward game, both players are less likely to make significant errors.

Across a large sample size a stronger player will usually make fewer errors, but if he loves to play aggressively and make speculative sacrifices, he might actually make a lot of moves that the engine considers to be a blunder but that are difficult for a human opponent to refute.

So even there, you really do need to factor the difficulty of the position into the equation.

The problem with simplistic metrics based on engine evaluation is that they are just that: simplistic.