Python Script to download entire game archive for a specific user & convert it into a CSV

Sort:
datasherlock

Chess has always had a special place in my heart. It's a sport, a hobby and the thing I fall back on both to de-stress or just have some fun.

As part of my ongoing effort to employ analytics and data engineering to everyday life, I decided to build a dashboard on top of my dataset of 2000+ games. That's roughly 60,000 moves!

For the uninitiated, Chess games can be exported in a special 'PGN' format. So I had to start with writing a Python script that could programmatically access all my games through the API, download them, parse each PGN using a Python script and append into a relational structure before I can start building any visualization.

Today, I managed to finish part 1 of this exercise i.e.

1. Download game archive from chess dot com servers

2. Clean and Parse each file into a readable format

3. Load into a relational structure.

If this excites you, you can use the script to parse your own archives. Just update the user variable to your username and you should be good to go.

https://gist.github.com/DataSherlock/58e6285dbd11cbba9d29b32c5480521d

I will share the link to the dashboard once I finish.

notmtwain
datasherlock wrote:

Chess has always had a special place in my heart. It's a sport, a hobby and the thing I fall back on both to de-stress or just have some fun.

 

As part of my ongoing effort to employ analytics and data engineering to everyday life, I decided to build a dashboard on top of my dataset of 2000+ games. That's roughly 60,000 moves!

 

For the uninitiated, Chess games can be exported in a special 'PGN' format. So I had to start with writing a Python script that could programmatically access all my games through the API, download them, parse each PGN using a Python script and append into a relational structure before I can start building any visualization.

 

Today, I managed to finish part 1 of this exercise i.e.

1. Download game archive from chess dot com servers

2. Clean and Parse each file into a readable format

3. Load into a relational structure.

 

If this excites you, you can use the script to parse your own archives. Just update the user variable to your username and you should be good to go.

 

https://gist.github.com/DataSherlock/58e6285dbd11cbba9d29b32c5480521d

 

I will share the link to the dashboard once I finish.

 

Yes, sounds very nice.

Did you learn anything? 

Is there any useful data in the API that isn't provided to users in stats?

datasherlock

It's purely educational though it does provide the opportunity to create some cool aggregate level meta metrics like the times of the day you played/won/lost the most games or the color you won/lost the most with, the openings you prefer, and many more. 

datasherlock

Here's the link to a dashboard o created using this dataset https://public.tableau.com/views/MyChessJourney-Visualized/MyChessJourney?:language=en&:display_count=y&publish=yes&:origin=viz_share_link&:showVizHome=no

ollie_g

Awesome project, datasherlock! Quick question: does the API allow us to access our 4-player chess games, or just 2-player?

LeeEuler

Nice project!

mister_bludgeon

Nice work. It reflects good workmanship and enviable  patience. The first suggestions that come to mind is make the username a command line option, and have it display a usage statement when invoked with no arguments. If you want to create a github repo i would be honored to contribute a PR (pull request, for all you Normals wink.png -- a way of proposing changes to a project)

datasherlock

Thank you ollie_g, LeeEuler and mister_bludgeon! Not sure why I didn't get any notifications when you responded to my post. Sorry for replying so late.

@mister_bludgeon - Thank you for the kind words. I have it on Github - https://github.com/DataSherlock/chess-analytics

I don't plan to actively maintain it though. This was just something I had wanted to do for long and found time during the Christmas holidays. I will make the refinement you suggested and some more I observed during my next vavation! happy.png

All your replies truly made my day.

datasherlock
ollie_g wrote:

Awesome project, datasherlock! Quick question: does the API allow us to access our 4-player chess games, or just 2-player?

The API I used just gave the 2 player versions.

MickeyDooley

This is awesome. I also included this to prevent SSL verification:

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

Do you use Tableau? Seems like there is opportunity here for some very cool dashboard work.

datasherlock
dherook wrote:

This is awesome. I also included this to prevent SSL verification:

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

Do you use Tableau? Seems like there is opportunity here for some very cool dashboard work.

Yes. It offers the opportunity to present stats a little differently from what chess.com officially shows since my script generates data at the level of every move. 

 

LessnerZack

Hi @datasherlock , very cool project! I just sent you an email about a question I had with the script. Thank you!

Bigfooter2011

Hi @datasherlock, thank you for this. It was a very helpful program and is really commendable as I myself am relatively new to both chess and programming. Keep it up!

llama47

It threw some warnings, but it worked. Files separated by month and can immediately be opened by chessbase. Nice work, thanks.

chesslover0003

Does this script still work? I have it running on Debian but it doesn't appear to get any PGNs.

Kurtpy

Thanks @datasherlock but had some problems whit json any advice to correction?. When open the url in navigator not have problems, data is there in json format. 
Traceback (most recent call last):
File "D:\Chess\ChessPGNParser.py", line 173, in <module>
main()
File "D:\Chess\ChessPGNParser.py", line 151, in main
getPGN(user)
File "D:\Chess\ChessPGNParser.py", line 27, in getPGN
for url in json.loads(pgn_archive_links.content)["archives"]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)