I corrected the URL to the download Webseite. Sorry for the inconvenience.
Free chess game database with over 11 million games (Scid vs. PC database format)

I've updated the database file, imported all PGN-Mentor games again and cleaned up the database again. This means that there are approx. 200000 more games in the database. Including the latest update of TWIC (1525).
I'm planning to update the database every two to three months.

I have searched the database for the number of games of some well-known, historical and current players and listed them on a separate page: https:/LumbrasGigaBase.com/notable-players-in-the-pgn-database/

Now you went through all this wasted trouble because you never first asked anyone here on Chess.com how to properly construct an opening database. Sheesh!!

Hello out there,
Thanks to the advice of a few users on lichess, I was able to reduce the amount of doubled games a lot. Sadly, there aren't 11 million games in the database anymore, but still way over 9 million.
I tried eliminate the doubled games in SCID with still having checked either "Same site", "Same event" or "Same round". But even after this, around half a million duplicates were found. I've checked the first 100 games using the SCID dialog and found out, that the actual moves seems to be identical, indeed - but the location, the event name and even the rounds were vastly different. Going through half a million games is quite impossible...
So I got fed up and deleted the rest as well. :D
The current stats are now:
- 9.386.860 Games
- 467.825 Player
- 94.887 Events
- 26.367 Locations
Regards,
Michael/Lumbra74
P.S.: I don't think it's useful to have weaker players in the database then 1800. These players are still in there, of course. I've got ahold of a database featuring GM's at their beginning of their chess careers. I imported these games after I got rid of the games where both players were below 1800 ELO.

this seems like a great databse although I think excluding the games where both players dont have an elo rating could be a mistake. as this would exclude games before there was an established rating system even though the players were still incredibly strong.

this seems like a great databse although I think excluding the games where both players dont have an elo rating could be a mistake. as this would exclude games before there was an established rating system even though the players were still incredibly strong.
I didn't removed these kind of games. I removed only the games, where BOTH players were below 1800.
A lot of the games without any ELO are historical ones. And it would be more than sad to lose these!
Is there a way we could download a "tailored" database? For example, I just want TWIC and maybe a couple others. It would cut down on the download size and time.

Not really, the SCID database is kind of monolithic. If more people are interested in just TWIC, I might consider uploading a second database. But I will not put a big amount of work into that. (It even shouldn‘t been necessary, because the source files are one for each week)
It says on their site, "A compilation of the games can be had for a donation of £30. There have been over million games in TWIC since the first edition on 17th September 1994."
And at chesspublishingDOTcom they have a 12 month offer for $99 which also has a "Gold Plus" section.
I wonder how much overlap there is. That's a total of almost thirty eight with ninety nine. This brings it to about one hundred and thirty seven.
The idea wouldn't be to make it available each week. It could be something where you pay a yearly price, and you could update quarterly or bi-yearly. People would select the sources they wanted when they made a payment and then you would know ahead of time the interest.
Chess.com has sectioned master games, and lichess seems to have more to offer to the free users, but a site like chessgamesDOTcom often have games I am looking for that both don't have.
Maybe someday someone will combine TWIC and ChessPublishing at a price like 60USD/year. That would be 5/month, 15 quarterly, 30 bi-yearly. ChessPublishing could look at it this way, while they may have a group of GMs and IMs hired to write these articles, they also have older articles which many of us haven't read. So, in their 12 category area say the same opening they write a new article about is also in their archive, they could provide the archived one in the 60 deal. If readers want the updated article and others, they could then pay them directly an additional cost of maybe something like $5 for a monthly bundle including the new articles the reader wanted. Perhaps this is kinda what the Gold Plus section is like.
But the way they have it now, you can only get one section at their 20 USD deal. I would rather get a scaled down mix instead of only one section.

Your database is a fantastic resource, and at a price that can't be beaten!
Thank you for your hard work and for sharing. I've bought you a coffee! ☕

I was able to reduce the duplicate games much further by following the path a hint from a user of another forum. It's now all implemented into the database version 2024-02-06
The parameters I used to clean up the duplicates were now (quite aggressive):
- first 4 letters of the player names
- same colors
- same year
- same result
- same moves
Current content of the database
- 8.675.290 Games
- 466.802 Player
- 85.834 Events
- 26.668 Locations
- "Source" tags added
A SOURCE tag has been added to all games
- "TWIC" – for all games from the TWIC download
- "Lumbras Giga Base" – for all games whose source is not clearly visible, as they are from pre compiled databases.
- "Britbase" – The British Chess Game Archive
- "ChessOK" – chessok.com still offers PGNs free of charge.
- "Chess Nostalgia" – Nothing to find anymore about this website.
- "PGN Mentor" - for all games donloaded from PGN Mentor
PGN Mentor and TWIC are the sources I trust the most, to be quite clean. So I imported them last and removed the duplicate games with the lower database ID.
Regards,
Michael/Lumbra74
P.S.: TWIC 1526 is also added

I was able to reduce the duplicate games much further by following the path a hint from a user of another forum. It's now all implemented into the database version 2024-02-06
The parameters I used to clean up the duplicates were now (quite aggressive):
- first 4 letters of the player names
- same colors
- same year
- same result
- same moves
Current content of the database
- 8.675.290 Games
- 466.802 Player
- 85.834 Events
- 26.668 Locations
- "Source" tags added
A SOURCE tag has been added to all games
- "TWIC" – for all games from the TWIC download
- "Lumbras Giga Base" – for all games whose source is not clearly visible, as they are from pre compiled databases.
- "Britbase" – The British Chess Game Archive
- "ChessOK" – chessok.com still offers PGNs free of charge.
- "Chess Nostalgia" – Nothing to find anymore about this website.
- "PGN Mentor" - for all games donloaded from PGN Mentor
PGN Mentor and TWIC are the sources I trust the most, to be quite clean. So I imported them last and removed the duplicate games with the lower database ID.
Regards,
Michael/Lumbra74
P.S.: TWIC 1526 is also added
Nice. I noticed that there was still some duplication, especially when looking at historical games. I'll check out the new database!

Mate, I run a chess YouTube Channel (Adventures of a Chess Noob) and I'm actually using your database for some of the research that I'm doing for a book that I'm writing. Would you be happy for me to promote your database in a YouTube video some time soon? Or would you prefer to work on your database for a little while longer first?
Cheers.

I've just release a new version of the database, which now includes the "Lichess Elite Database" and the games pulled from the lichess broadcast system.
Now in the file formats:
- si4 (Scid vs.PC/MAC)
- si5 (Scid 5.0)
- PGN (divided into different time ranges)
Contents:
- 12.494.360 Games
- 529.315 Player
- 86.010 Events
- 26.667 Locations

The structure of the database is now complete. All data has been cleaned up as far as possible.
New games will be added weekly and then uploaded - probably on Tuesdays or Wednesdays, depending on the release of the data from TWIC. I also will release a monthly "cumulative" update as a PGN file, containing all games from the previous month.
In addition to a database for Scid vs.PC/MAC (si4), a version for Scid 5.0 (si5) is now online, as well as individual PGN files for different time periods or years.
The database now contains:
- More than 12.490.000 Games
- More than 528.000 Players
- More than 85.000 Events
- More than 26.600 Locations

@Lumbra74: I've just published an article/blog on your fantastic database along with a video on my channel! https://www.chess.com/blog/vitualis/free-massive-chess-database-get-lumbrasgigabase-with-13-million-games-chess-chats-6
I've created a blog post for future information/discussions:
https://www.chess.com/blog/Lumbra74/lumbras-giga-base-free-massive-game-collection-in-scid-and-pgn-format
---
Hello out there,
I've created a database with over 11 million games. I started based on several existing databases, some of which were already several years old in this form. A game collection has now been created from the following sources:
The data preparation process
After merging the databases, a number of measures were taken to compress the database:
Contents of the database
The database is available for free at https://LumbrasGigaBase.com
How can you support me?
I love coffee! You are welcome to buy me a coffee!
The initial creation of the database, the cleansing, research for sources etc. was quite time-consuming. However, this has now been completed, so further maintenance is no longer a major problem. But of course I also pay for this website – so if anyone likes the database and wants to help keep this site going, please feel free to support me on my website.
Regards,
Lumbra/Michael