Science of Chess: What makes a move seem "Brilliant?"

NDpatzer

Updated: Apr 5, 2025, 8:58 AM | 15

My favorite chess book of all time has to be Bobby Fischer's "My 60 Memorable Games." I was given a copy as a gift when I was about 9 years old or so, and I've read and re-read it too many times to count since then. I know players who are more serious than me use annotated collections like this to study and learn, but I always treated it more like an opportunity to watch amazing chess unfold in the diagrams (and on my own board if I felt ambitious) any time I wanted. Part of the fun with Fischer's book is the play-by-play commentary supporting each game. There is a ton of personal detail, funny anecdotes, and guest appearances from other players.

Reading through and playing these games was a bit of a challenge for me when I was a school-age patzer. The games in this edition are written out in descriptive notation (1. P-K4, for example) rather than the algebraic notation (1. e4) I was learning to use. This made reproducing the game in my head or on a board a little more like a code-breaking exercise for me, but I liked the challenge and the mildly archaic look to the text. Besides the slightly odd notation, the moves are also richly annotated with marks signaling blunders, questionable moves, intriguing moves, and brilliant moves. This notation also made the text come alive and posed new challenges for me as a young player: What made the bad moves so bad? I often looked at them and thought they looked perfectly natural until I kept going to see what Fischer had in store for his opponent. Maybe more puzzling to me in some cases were the good moves: What made some of them so noteworthy? Again, I could always keep playing and read the annotations to find out, but often the real brilliance of the moves remained obscure to me.

The picture above illustrates one of Fischer's most legendary brilliant moves. His sacrifice with15 ... NxP! (or if you prefer, 15 ... Nxf2!) sets off a beautiful combination that led to this game against Robert Byrne from the 1963/64 US Championship winning a Brilliancy Prize. As a student in chess classes at the Murrysville Chess Club , we analyzed this game one week (guided by the very patient Jay Griffin), so I ended up with a decent understanding of why this move was so impressive. On balance, however, Fischer's brilliance (and his opponents' in a few cases) was something I often just had to take his word for.

Brilliant moves online

When I first started playing chess online a few years ago, the prospect of reviewing my games afterwards with an engine was very exciting. From a learning and improvement perspective, seeing the moves that were labeled as blunders and inaccuracies is probably more useful, but I couldn't (and still can't!) help being excited to see the occasional moves that were labeled as being exceptionally good. If you play at chess.com, their game review algorithm assigns labels to your moves that include "Great" and "Brilliant," so you can imagine your own games marked up like the ones I read about in Fisher's book. For ordinary players like me, these brilliant moves ought to be fairly rare, and in my case they certainly are! Here's my breakdown of various move types from the CC "Insights" panel of my Blitz performance.

As much as I like seeing those rare !! symbols tacked onto my own game, I don't take them too seriously. After all, my overall level of play hardly warrants adjectives like Great or Brilliant in general, so these annotations probably ought to be considered in the context of my skill and my opponents' typical skill. But are they? How exactly does the engine decide which moves are Great and which are Brilliant? Here is an example of a position from one of my games in which I managed to make not just one, but two "Brilliant" moves back to back. See if you can spot the first one (Black to move).

I doubt that this would have impressed anyone if it happened in a game between titled players, but I was glad I saw Nf3! My opponent followed up with h3, making my Bxh3! response another "Brilliant" move. Both of these moves have the characteristics that it turns out chess.com uses to assign the Brilliant label: The algorithm pretty much requires that you sacrifice a piece, and that this sacrifice be "good" in the sense that it does not weaken your position. There are a few more bells and whistles per their support page, namely that it also needs to be the case that you wouldn't still be winning convincingly without making this move. Another way to say that is that a Brilliant move has to be a sort of narrow path to a winning position: You can't just sacrifice stuff when you're overwhelmingly ahead and get credit for it.

The project of figuring out how to reasonably assign brilliancy to players' moves raises an interesting cognitive question: What makes moves seem brilliant to human players? When Bobby Fischer decides that a move he or one of his opponent's made deserves to be considered remarkable, I know that his decision was based on a deep understanding of the game. Part of that understanding is a subjective sense of which moves are both very good (which is easy to define with a modern engine) and difficult to see (this one is much harder to define). We don't call an obvious winning move Brilliant, after all - part of what makes these moves seem worth noting is that they are easy to miss (for a human). The rough definition I gave you in the previous paragraph is one attempt to make those two qualities concrete: Maintaining a winning position makes a move good and the algorithm assumes that piece sacrifices are generally tough to see. Who thinks of throwing a piece straightaway, after all?

GM Mikhail Tal, master of the sacrifice. Image by Croes, Rob C. for Anefo, CC0, via Wikimedia Commons

OK, maybe some players think of that first. All joking aside, this is the tricky and fascinating part of trying to automate annotations like this. You essentially need a model of what players will and won't think of easily and such models are not easy to develop! The target article I'll tell you about next is an attempt to tackle this challenge head-on by comparing human annotations of brilliancy to those assigned by different computational models. How can we best capture the subjective impressions of surprise, strength and creativity that humans use to call a move Brilliant with an objective computational criterion?

Modeling perceptions of move brilliance

In this study, the authors needed to do two things: (1) Collect a large-ish number of human ratings of move brilliance (and other annotations like Great, Bad, etc. to serve as a comparison class) from real games. (2) Compare models designed to guess those human labels using stuff we can calculate from the position itself.

Human judgments of move aesthetics

I'm going to refer to the assignment of labels to moves as being judgments of move aesthetics from now on. This is a useful catch-all for these kinds of evaluations that a few different researchers have used (see Osborne 1964 and Humble, 1993) and that highlights the idea that we're really talking about judgments of something like beauty, creativity, and surprise. At any rate, to get a decent sample of judgments of move aesthetics, the authors collected about 7000 moves from studies available on Lichess that were labeled as Brilliant by human annotators (820 moves), Good (1637 moves) or Other (4518). From now on, this is the problem they're going to try and solve - can they work out which of these categories a move belongs to? Perhaps more importantly, can they provide insight into what's different about the way a model evaluates the moves in these different groups?

An example of a Lichess study with clumsy annotation by the author added for illustration purposes only!

Building a model of aesthetics from chess engines

The next part of the authors' work is much harder: How do you decide to calculate something from the position itself that might have something to do with human perceptions of aesthetics? The goal here wasn't just to try and get the answer right (or as right as they could) but also to try and understand what measurements or features supported good computational guessing about human judgments. To try and arrive at these insights, the authors used two different chess engines, Leela and Maia, as the basis for separate models. There is a lot one could say about these two models, but the key distinction the authors talk about in their work is the difference in how the two engines learned to play: Leela learns through self-play (over 1 billion games) while Maia learns from games humans of different skill levels have played with each other.

Using both of these models, the authors could calculate a number of computational features from a data structure called the game tree. If you're not familiar with this term, it refers to the branching structure of possible move sequences away from a starting position. That starting position is usually called the root of the game tree and we can draw branches that extend away from it for each possible next move in the game. Each extension can then serve as the next node from which more possible moves can branch away, eventually yielding a hierarchy of everything that could happen from the position we started with. For a simple game like Tic-Tac-Toe, the game tree is not terribly large. For chess, the game tree is enormous - so large that engines generally don't try to examine the whole thing, but adopt some way to sample promising branches and select the best option from there.

A portion of the Tic-Tac-Toe game tree - not too intimidating, right? By Traced by User:Stannered, original by en:User:Gdr - Traced from en:Image:Tic-tac-toe-game-tree.png, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1877696

Using the trees generated by Leela and Maia, the authors were able to obtain a collection of measurements describing how the starting position (before the critical move was made) was being evaluated. I don't want to take you all the way into the weeds here because it does get rather technical, but these included things like whether the critical move was even included in the 10 trees each engine opted to examine, and how making the critical move was evaluated: How often did the engine decide that making the critical move improved the position vs. weakening it? How often did the engine think the move led to an actual advantage for the player? What was the winning percentage for the player that the engine estimated in different parts of the game tree? Together, these and a number of other measurements intended to describe the actual shape (width and height) of sub-trees inside the larger game tree were the vocabulary the authors used in each engine to make good guesses about human move aesthetics. They extracted these features from both models and varied the number of nodes in the tree that the engines could inspect from 10 all the way to 100,000 by factors of ten.

Brilliant moves might be good moves that look bad (if you don't think too hard)

I'll start with the answer to the main question: Can the authors accurately classify moves according to the aesthetic categories humans assigned them to? Briefly, yes! In fact, the authors found that they were able to do this at above-chance levels (70% or so) using just the engine estimation of win probability before and after the move, the difference in win probability before and after. Including more of their features improved their classification further, demonstrating that describing the nature of the game tree more thoroughly (including some of those descriptors width and depth of sub-trees) made it easier to guess which moves seemed brilliant and which ones were not.

Besides looking at the accuracy of guessing human move aesthetics, though, the authors also included some interesting analyses of how the two models under consideration evaluated moves in the Brilliant, Good, and Other categories when fewer nodes were included (10 is the low bound, remember) and when more are included (10^5 is the upper bound). First, let me show you how the Winning Chance of these three kinds of moves changes on average when Leela gets to use fewer nodes or many more.

Adapted from Figure 2 of Zaidi and Guerzhoy, 2024 - Win Chance increases for each move type as the number of nodes increases.

I know the axes might be a little tough to read here, so let me tell you the key things to notice here. As you move from left-to-right across each panel, the number of nodes that Leela gets to inspect increases by powers of 10. These squeeze-y shapes are a Violin plot of the data, which is intended to give you a sense of how the data is distributed across y-values at each stop on the x-axis. A skinnier part of the shape means that there are fewer data points there, while a wider part means lots of data is concentrated there. I hope you can see that it looks like a meaningful difference between Brilliant moves (leftmost panel) and the Other moves (rightmost panel) is that the red line connecting the mean values of these shapes has a steeper slope on the left. What does this mean? It suggests that the thing about Brilliant moves is that they start to look better when the engine gets to look at much more of the game tree. A shallow look isn't so exciting, but a deeper look makes a Brilliant move start to shine.

Here's what the same analysis reveals about Maia.

Adapted from Figure 2 of Zaidi and Guerzhoy, 2024 - Win Chance increases for each move type as the number of nodes increases.

The Maia data might look much the same to you at a casual glance, but there's something even more compelling here. Not only is that red line steeper for the Brilliant moves, but check out just how dramatically different the violin shapes are in that leftmost panel. Compared to the data from Leela, the neat thing about this is that the Brilliant moves are not only evaluated as weaker when Maia only looks at 10 nodes, they actually appear to be losing moves on average. Good moves have a weaker version of this problem, but not so dramatically as the Brilliant ones: The thing about brilliancy as far as Maia is concerned is that a Brilliant move looks like a terrible one at first, but reveals itself as a winner when the engine has the resources to evaluate it more thoroughly. Even the step up from 10 nodes to 100 shifts the distribution of values for Brilliant moves from the absolute floor to at least equal chances. While the authors offer evidence that understanding more about the game tree helps us model move aesthetics more completely, this is a neat way to think about a simple computational definition of what it means for a move to look Brilliant.

Conclusions and next steps

I'm still really interested in understanding how we can try to understand subjective human judgments about the game using computational models, and I think this was a great addition to this literature. I can't help but think back to my book of Fischer's memorable games and wonder how these engines would do at guessing his annotations. The dataset considered here has the benefit of being compiled by a wide range of players, which means more variability about move aesthetics was built into the sample. I think that's a good thing, but it could also be a neat historical project to consider how aesthetic judgments about moves made during different eras might be more or less predictable. The game has changed many times over the centuries, after all, and what looks brilliant and unexpected to one generation may be part of what a new generation considers straightforward. For now, this study is an intriguing look at how we can make humans' subjective impressions of the game more concrete.

Now, please bear with me for a short announcement that I think some of you may be interested in...

Chessable Research Awards - Submit your proposals by May 15th!

Some of you that read my Science of Chess posts may be scientists yourselves and may have your own interesting questions about the game that you'd like to research. The latest round of the Chessable Research Awards is now open for submissions and there is plenty of time to put together a proposal before their May 15th deadline. One of my students (the intrepid Alex Knopps) was awarded one of these grants last year and we've been excited to be able to pursue his ideas about collaboration in chess analysis with this support. If you'd like to chat about our experience with the CRA, feel free to drop me a line or check in with the Chessable Science team.

Support Science of Chess posts!

Thanks as always for reading! If you're enjoying these Science of Chess posts and would like to send a small donation my way ($1-$5), you can visit my Ko-fi page here: https://ko-fi.com/bjbalas - Never expected, but always appreciated!

References

Humble, P.N. (1993) Chess as an art form. British Journal of Aesthetics, 33, 59-66.

McIlroy-Young, R., Sen, S., Kleinberg, J. & Anderson, A. (2020) Aligning superhuman AI with human behavior: Chess as a model system. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1677-1687.

Osborne, H. (1964) Notes on the aesthetics of chess and the concept of intellectual beauty. British Journal of Aesthetics, 4, 160-164.

Zaidi, K. & Guerzhoy, M. (2024) Predicting User Perception of Move Brilliance in Chess. Arxiv, https://arxiv.org/abs/2406.11895