
Lumbra's Gigabase - Free, massive game collection in Scid- and PGN-format
Why was a new database created?
Some time ago, I searched for freely available PGN files or databases that are still being maintained, but was unsuccessful. Either such projects could no longer be found or they had not been updated for years. So I decided to create my own database with Scid vs. PC/MAC. I started on the basis of several existing databases and PGN files. I tagged each source with a SOURCE tag, so you will be able to search for them, using Scid vs. PC/MAC or Scid 5.0.
The database can be found on a website I have set up:
https://lumbrasgigabase.com
Tag | Description |
Britbase | For all games from the British Chess Game Archive |
ChessNostalgia (*) | Nothing more to be found on this page. |
ChessOK (*) | chessok.com still offers PGN’s free of charge. (Until the end of 2020) |
Chessopolis (*) | PGNs may still be offered, but then behind a paywall. |
Convekta (*) | Possibly a publisher who offered chess literature. Only dealers with Convekta products to be found |
LichessBroadcast | For the games that are pulled through the Lichess broadcast system |
LichessEliteDatabase | All (standard) games of lichess to keep only games of players with a rating of 2400+ against players with a rating of 2200+, excluding bullet games. Source: https://database.nikonoel.fr All classical, rapid as well as blitz games in which both players are over 2550 Liches ELO, are added. |
LumbrasGigaBase | All games from existing databases, where the origin cannot be clarified, have been given this tag. |
PGNMentor | Extensive archive with individual files for players, openings, opening variations and various tournaments. |
TWIC | For all games from the TWIC download. |
(*) Found in the Github project
You can search for the SOURCE tag in Scid. Menu Search –> General –> Extra tags:
The data preparation process
After merging the databases, a number of measures were taken to compress the database and eliminate duplicates:
- All games with less than 10 half-moves have been deleted.
- All player names, tournament locations, rounds etc. were corrected using Scid’s maintenance function, as far as Scid was able to do so.
- All games in which both players have an ELO rating below 1800 ELO have been deleted.
- ECO codes have been added to all games.
- All remaining games were checked for duplicates. The following parameters had to match in order to declare the game a double:Matching first 4 letter of the player names
- The same player colours.
- The same moves.
- The same result.
- The files were processed with the program pgn-extract. Unnecessary tags were removed and some were renamed to have the information available in standard tags (primarily date)
- The same player colours.
Content of the database
An example of the players contained in the database can be found here.
- More than 13.200.000 Games
- More than 590.000 Player
- More than 36.000 Events
- More than 26.000 Locations
A copyright on chess games without any annotations cannot be agreed under German law, but the commentary can. Therefore, the entire database was cleared of all annotations and variations. The German Chess Federation clarified this question in a short article in 2006. The expert opinion mentioned in the article is available online and can be downloaded:
DSB expert opinion on the question “Is there a copyright on chess games” (PDF in german language)
Future updates
The database will be updated weekly, usually Tuesdays, after the release of the most recent TWIC file. The following files will be uploaded:
- Database files (si5, si4 format)
- A differential PGN-file containing the new games since release of the last database.
- A monthly PGN-file, containing the new games
F.i.: The current database has been released on 02/27/2024. The last database of march will be released on 03/26/2024. These are 4 weeks, so the monthly update file will contain all weekly updates between the two releases.
How can you support me?
I love coffee! You are welcome to buy me a coffee!
The initial creation of the database, the cleansing, research for sources etc. was quite time-consuming. However, this has now been completed, so further maintenance is no longer a major problem.
But of course I also pay for this website – so if anyone likes the database and wants to help keep this site going, please feel free to support me on Buy Me A Coffee.