Did Hikaru Cheat?

Did Hikaru Cheat?

Avatar of evilmathcat
| 0

A lot has been said about Kramnik's cheating accusations in regards to Hikaru's performance.
In the context of Data Science, I took a different approach to it involving Magnus and Fabiano in the study.
Here's my take on it. You can also check the YouTube Videos and the Datalens Dashboard with the results for Hikaru, Magnus and Fabiano:
#Links:
email: evilmathcat@yandex.com
YouTube Video about the accusation: https://youtu.be/93M37WTjSCQ
YouTube Video on how I built the Project: https://youtu.be/qUoqV9T0-Us
Link to Datalens with the Blitz Streak Analysis for all players: Statistical Dashboards

# Intro
Former World Chess Champion GM Vladimir Kramnik indirectly accused GM Hikaru Nakamura, currently ranked number 2 of cheating.
He mentioned that Hikaru's performance in a series of online games, where he won 44.5/45 games is unlikely, and should be looked into.
This got me interested. 
I initially thought it's either one or a combination for these three things: 
1. "If this is a high level player, maybe he is seeing things most people don't". 
or,
2. "he could just be disgruntled as he is not as active anymore, as he lost his sharpness" 
or, 
3. "he is just pulling some 9000 IQ marketing gimmick."

Nevertheless, I thought it was worthwhile investigating.
As a self-taught statistician, I devote my focus to becoming proficient at it.
I love statistics, don't trust academia. People that aren't questioned, lose sight on reality.
This is the real world, not a vacuum. 

So, here we are steering clear from biased academia, focusing on science applied to the real world, and avoid overly complicated nonsense. 

By no means I am bashing anyone in this project, if anyone reading this feels that way, that is on you. 
What I am questioning is, if Kramnik's claims are sound or not from a statistical perspective. 
What are Kramnik's "statistically based claims"? 
What is his understanding of Statistics?
We are now on my playground.
And for that, I will develop my approach with Statistical Principles backed up by Science, not by empty words.

With that out of the way, let's get started:
As a first step, as a listener, first and foremost, I wanted to get each one's perspective, so I could then decide on a formal plan to take.

--------------------------
Kramnik's perspective:
--------------------------
While watching some of his games, Kramnik kept reporting and blocking people for cheating, doing "the procedure" as he calls it, and he kept mentioning chess.com's accuracy of the player as being too high, which can't possibly be from his perspective. This is something worthwhile investigating.

I also noticed he tends to lose on time a lot.
He is much slower using the mouse, and when his playtime is ending, he either blunders or rage-quits or both.

At the time of doing this project, Kramnik played against GM Jose Martinez Alcantara, "Jospem" as he is called. The event was a total circus.
They played online although they were both physically in the same room with arbiters over their shoulders.
The game was being commented IM Gotham Chess and IM Rosen.
Kramnik kept complaining throughout the event, and when there was some issue with the computers, he quit.
The results were 15,5 to 11,5 in favor of Jospem. Kramnik Lost.

So, at first sight, without even having done any quantitative work yet, it is clear that he is either, again: 
1. losing is mind 
or 
2. doing this on purpose as marketing gimmick.

------------------------
Hikaru's perspective:
------------------------
Hikaru made a significant change from being more on the offline world in chess to the online world as a streamer. He still plays Over The board (OTB) events, but he seems more focused on the online side of it.
You can see this by the sheer number of played games on Chess.com
As of 20/08/2024: there were 56,639 games Hikaru played.
In comparision to the current World Champion Magnus Carlsen: he played 5,411 games.

So this is one found key point:
Chess is known not to provide that much economical benefit.
These players devote decades of their lives to improving their performance in the game, and there is not much economical benefit from it. Eric from Chessbrah took time to explain this in his video.
Some players as Magnus, Hikaru and some others are paid, but for the most part, aren't.
So, at first sight, with the shift from offline to online,  if there have been more economical benefits of staying a better portion of his time, in the comfort of his home, getting paid more, with less stress, while keep doing what he knows how to do well, just by direct assessment his results are likely to go up. 

GM Magnus Carlsen even commented about this in an interview, stating he started doing better when he transitioned to online.

---------------------------------
About the days in Question:
---------------------------------

The games were 3m Blitz, a format where Hikaru excels, and played between 16th and 17th November 2023, so 2 days.
If he was playing Magnus, since he doesn't have this kind of track-record of winning against him that I could find, would be one thing, but that is not the case.
He played against 6 different players, across these 2 days (with one player that played both days).

All players were ranked lower than him on average by 315 points.

Without even further checking, it is highly unlikely he cheated, quite the contrary, it is extremely likely that he would win and continue to do so in the future.

It would be like me explaining univariate inferential descriptive statistics concepts to someone that knows nothing about statistics on a day where I was completely exhausted.
Not only I am 100% confident I would get my message across, but also do that without much need for lots of thinking, since it is second nature to me at this point.

You can even ask this question backwards:
"How likely is it for any of these players: 
1. with less experience
2. 315 rating points lower 
To consistently beat him?
Unlikely, right?
It seems so.

Just by the ELO formula only,  with averaged out values across both days:

Hikaru's Averaged Blitz ELO from 16th-17th Nov 2023: 3266
Opponent's Averaged Blitz ELO from 16th-17th Nov 2023: 2951
Difference: 315
Expected win probability: 85.98%

No surprises here.

In addition, we can also address Hikaru's win rate, both from FIDE and Chess.com's perspective (white and black)

*Bullet games are not included in this study, since I established Blitz as a minimum time control.  They are significant though, particularly in Chess.com, given these players have a large volume of played games in this time control.

So now that we have this first analysis out of the days in question and the player's historical performance, here's the approach I took:
I wanted to assess 2 things:
1. Is the player average winning streak greater than 44 games?
2. Does the player have winning streaks greater than 44 games?
And check these questions against our benchmark(Magnus and Fabiano)

-----------------------------------------------------------------------

My Procedure...

1) Data Sources: Chess.com, FIDE, PGNMentor.com

2) Tools: Statistics + Python + Stockfish + Excel + DataLens

-----------------------------------------------------------------------

# Data Retrieval:
1. I Scraped all chess.com pgn games for all three players: Magnus, Hikaru and Fabiano

# Data Processing:
2. I ordered each pgn by minute
3. Combined all pgns into one per player.
4. Added the following additional headers to each pgn: 
a) ID: so I could revisit a specific game in question
b) Time_Control: Daily, Classical, Rapid, Blitz, Bullet, since we are interested in Blitz.
c) *Event: Online/Offline: Here is chess.com, so all games are online, even the ones in venues, players are still using computers, so I considered these online. Nevertheless, the program works with all games OTB or online, since I added all existing names of all chess venues to the script. It can classify if the venue was held online or offline even with incomplete pgns.
c) Opponent Elo Difference: This is relevant to find out whom the player played against so we can check per opponent basis, average overall, and bin per group of players in different elo interval widths. 
d) Number of Moves: Self-explanatory, yet I wanted to have this available so that in the future, we can analyse specific game patterns.
e) Calculated Stockfish Accuracy: I did it myself with python+stockfish, since the retrieved chess.com pgns don't have the accuracy per player.

A note about the streaks:

I considered a streak: at least 2 consecutive wins, with the possibility of having at most one draw in between. A loss breaks a streak, and if the player changes the game format, meaning blitz then rapid then blitz again, this also breaks a streak. And finally, a streak must be held consecutively in days as well. 

With this out of the way, here are some images of the process. Starting with the pgn processing.

Here is an example with the added information:

# Data Manipulation:
5. I then created multiple dataframes:
a) Player games filtered by blitz: the format which we are interested in knowing more about.


b) Consecutive winning streaks: calculated from the blitz games. Each consecutive streak is accompanied with its frequency for each player ID. Say a player played a streak of 10, 4 times as an example.

c) Streaks details: this allows us to have more information about the game IDs, opponents ELO, piece color and accuracy. That we can retrieve from the original ID more specifics on the game.

# Data Analysis:
6. I then performed summary statistical analysis on the dataset. As I already stated, summary statistics in this case suffices. I also considered not dealing with a sample but with the whole population of each player's whole universe of games, since we have all of them from the universe in question. We are not estimating anything at this stage. We have all his games, not a sample.
Clearly at first glance, as you are about to see, each set is highly positively skewed for all players. It makes sense at it is more likely to play more shorter streaks than longer ones which are clearly outliers.

Here are the summary statistics for each player.

Analysis for Hikaru Nakamura's streaks:
Statistical Summary for Hikaru Nakamura:
Total Analyzed games: 58245

number of Streaks(n): 3729

Mean: 8.6

Median: 5.5

Mode: 2.0

P1: 2.0

P5: 2.0

Q1: 3.0

Q2 (Median): 5.5

Q3: 10.5

P99: 45.0

Highest streak found: 113.5

Frequency of the highest streak: 1

Test 1: Is the player’s average winning streaks greater than 44 games?
No significant evidence was found that the average streak is greater than 44

Test 2: Does the player have winning streaks greater than 44 games?
Number of streaks of 44 or more games: 40

Longest streak: 113.5




Analysis for Magnus Carlsen's streaks:
Statistical Summary for Magnus Carlsen:
Total Analyzed games 5540

number of Streaks(n): 405

Mean: 6.73

Median: 5.0

Mode: 2.0

P1: 2.0

P5: 2.0

Q1: 3.0

Q2 (Median): 5.0

Q3: 8.5

P99: 27.96

Highest streak found: 50.5

Frequency of the highest streak: 1

Test 1: Is the player’s average winning streaks greater than 44 games?
No significant evidence was found that the average streak is greater than 44

Test 2: Does the player have winning streaks greater than 44 games?
Number of streaks of 44 or more games: 1

Longest streak: 50.5




Analysis for Fabiano Caruana's streaks:
Statistical Summary for Fabiano Caruana:
Total Analyzed games: 4777

number of Streaks(n): 343

Mean: 5.43

Median: 4.0

Mode: 2.0

P1: 2.0

P5: 2.0

Q1: 2.5

Q2 (Median): 4.0

Q3: 6.75

P99: 17.0

Highest streak found: 38.5

Frequency of the highest streak: 1

Test 1: Is the player’s average winning streaks greater than 44 games?
No significant evidence was found that the average streak is greater than 44

Test 2: Does the player have winning streaks greater than 44 games?
Number of streaks of 44 or more games: 0

Longest streak: 38.5




Conclusion

Besides the work I've shown here, for the techies reading this, I also used other techniques such as SARIMAX to account for performance differences and Montecarlo Analysis.
We could make a case to use Montecarlo analysis,  by binning the the elo difference range of the players he played against, and see the estimated probability of similar streaks by iteration.
Yet after doing it, I didn't see the point to it. This case is straightforward just by using descriptive statistics analysis from his dataset with common sense.

In addition, the ELO formula already is accounting for the expected winning probability between two players. Although expected values are not the same as observed values, if one even accounted for outside influences, it's not like Hikaru played against 100 people. He played against 6 people across 2 days. 6 people with an average rating less 315 points than him. So it is about focus and proficiency.  And just by following the ELO guidelines it doesn't fall outside of it.

It is clear that Hikaru has many more streaks than Fabiano and Magnus. But he also plays more online than all of them combined. So there are no surprises here. And it's not like you don't have a track record of Hikaru's performance across many time controls and different events both offline and online. Anyone can check his history of chess playing across his games.

I would make a case for weaker ELO opponents that are coming from a less financially secure place than Hikaru is. That is another story. Dedicating your life to a sport that pays almost no money, and if you don't finish in the top places, with expenses accounted for, you lose money. This is something worthwhile to look into. But that's beyond the scope of this. It's Hikaru we are focusing on in this study.

A better question to ask is: how likely is it that Magnus, Fabiano, or Alireza could accomplish the same thing if they wanted to, especially considering that Hikaru's average opponent is rated much lower than him? Hikaru didn't cheat on those days - he doesn't need to. Kramnik has either lost his mind or is pulling a marketing stunt. Either way, he’s getting attention. I'm on my own side, not anyone else's. All these chess players are a bunch of arrogant prima-donnas. I just find it distasteful to attack people for views.