How to store chess openings in a relational database? - Chess Forums

Sqod

Apr 11, 2017

0

#1

Does someone here have good ideas of how to create an opening database using a relational database? I've been unable to find specific information on this.

One reason I want to do this is to get a computer to generate automatically certain statistics, especially variances and counts of piece placements on given squares, that are too lengthy to do by hand. To my knowledge no move database does such operations. I would also like a database that consistently detects transpositions, which some don't. In general, I want to do things that aren't possible with commercial databases or free databases. Clearly I need to directly use database queries instead of relying on the limited modification features allowed by commercial software that prevent direct database access (e.g., Improve Your Chess with Text Annotations in Chessbase; https://www.youtube.com/watch?v=NaMcb041dNY ). I'm thinking of playing around with the free database OpenOffice Base.

To implement an opening database it appears to me that the programmer will need to do at least the following:

(1) Put each ply (e.g., e4) into a separate field.

(2) Create each record based on numerous fields, where each field is one ply of the opening (e.g., 1. e4; 1...c5; 2. Nf3; 2...d6; etc.), which will require at least 40 ply, since some openings extend for at least 20 moves. I don't know what the primary key would be, however.

(3) Convert each record into a single number or string that describes the position, so that any record generating the same position string will be recognized as the same position, and therefore as a transposition of the same opening. This is not a problem since there is a lot of information on this--it's usually done by hash tables...

https://chessprogramming.wikispaces.com/Repetitions

...but I'd want to create my own scheme.

Each of these intentions incur a number of programming problems that need to be solved to work around the limitations of relational databases. I looked briefly at hierarchical databases, which are tree-structured databases, as an alternative, but there is little information on those, and many negative conclusions about their performance and flexibility (e.g., Hierarchical & Network Database Models;
https://www.youtube.com/watch?v=1Wnt1XIYMoM&t=1071s).

marknatm

May 31, 2017

0

#2

Sounds interesting, but wouldn't the amount of time needed to do this be overwhelming compared to the benefits you could potentially get from it?

Sqod

May 31, 2017

0

#3

marknatm wrote:

Sounds interesting, but wouldn't the amount of time needed to do this be overwhelming compared to the benefits you could potentially get from it?

It's not clear if by "you" you mean me, or a given person in general. I'm not thinking of my personal benefit, but of the long-term advance it would bring to chess. This is what is known as "innovation," in contrast to doing what everybody else has been doing for the last few centuries. I have very good reasons to believe there exists a new type of math that applies to chess and that the database statistics combined with that math and an abstract notational system would greatly increase the speed at which players could learn chess well, as well as possibly advance artificial intelligence for computer chess programs.

mgx9600

May 31, 2017

0

#4

Do you have any schema ideas?

Just a back-of-the-envelope sketch, I don't think RDBMS is best suited for this task because it looks highly procedural (I'm thinking FENs of board positions which doesn't lend itself to relational queries). Hence, I'm curious to see your schema idea (and ,by extension, the relational queries).

Sqod

May 31, 2017

0

#5

mgx9600 wrote:

Do you have any schema ideas?

Just a back-of-the-envelope sketch, I don't think RDBMS is best suited for this task because it looks highly procedural (I'm thinking FENs of board positions which doesn't lend itself to relational queries). Hence, I'm curious to see your schema idea (and ,by extension, the relational queries).

No, for the most part I don't have anything written. Most of what I want would be functions that operate on move trees, so using a relational database would likely require a lot of workarounds since relational databases can't store trees in a natural tree-like manner. However, just about the only database software available, especially free software, is for relational databases, not for more exotic types. Then there is the problem of transpositions, since chess move trees aren't really trees but rather directed graphs, including cycles, which complicates the situation even more. That's why I was going to start by merely playing around first: to get a feel for what could be done with existing software, and to get a feel for how much would require workarounds, and how difficult those workarounds would be.

marknatm

May 31, 2017

0

#6

Well, good luck with your endeavor. It's an interesting idea.

survifit

May 31, 2017

0

#7

Have you considered something like Neo4J as an open-source graph database rather than a RDBMS? It may make it easier to directly translate your ideas into the database structure.

Sqod

Jun 1, 2017

0

#8

survifit wrote:

Have you considered something like Neo4J as an open-source graph database rather than a RDBMS? It may make it easier to directly translate your ideas into the database structure.

I'd never heard of that application, thanks. It looks pretty good...

Natural Language Processing with Graphs
https://www.youtube.com/watch?v=BVMx24dtko0

...but I'll have to think about all this some more, especially with respect to the specific things I want to do. For example, one thing I'd really like to do is to describe where a given unit typically goes for each position. For example, where does Black usually put his QB in Philidor's Defense? Or where does Black usually place his KR in the Cochrane Gambit? These are questions that aren't easily answered, or if they are, they are rarely published anywhere. To answer such questions you'd have to search a tree, often to its ends, and then post a summary of the results at the current position. Another question I'd like to answer is how valuable each average displayed outcome is, by specifying the variance as well as the mean. There is much more I'd like to know for each position, including much more abstract summaries that don't involve numbers, but I could use answers to those first two types of questions right away as an aid to learning openings more rapidly.

MikeCrockett

Jun 1, 2017

0

#9

@OP "One reason I want to do this is to get a computer to generate automatically certain statistics, especially variances and counts of piece placements on given squares, that are too lengthy to do by hand. To my knowledge no move database does such operations."

Chessbase provides this capability.

Cherub_Enjel

Jun 1, 2017

0

#10

Well, I'm not sure how valuable this will be for players who aren't masters.

Below the master level, the current method of studying openings is "perfect" in that it doesn't matter very much.

An analogy would be like optimizing the type of pencil you use on an examination - as long as it can write without major inconvenience, it will not decide the result of the exam.

Williamfwm

Jun 1, 2017

0

#11

(1) Put each ply (e.g., e4) into a separate field.

No. You never design a database schema this way. I recommend you read up on "normalization". Familiarize yourself with the "normal forms" of a relational database.

(3) Convert each record into a single number or string that describes the position, so that any record generating the same position string will be recognized as the same position, and therefore as a transposition of the same opening. This is not a problem since there is a lot of information on this--it's usually done by hash tables...

https://chessprogramming.wikispaces.com/Repetitions

...but I'd want to create my own scheme.

I wouldn't make up your own scheme until you've at least familiarized yourself with the existing schemes and why they're used. Zobrist hashing is used for quick lookups because a bitfield is smaller and quicker to manipulate relative to a full FEN string. However, this is more relevant to chess *engines*, which are doing millions of manipulations per second on position data.

For database lookups, I would just use FEN* as the unique key identifying a position (which is in line with what you described). If it results in the same FEN, but follows a move order belonging to a different line, then it's a transposition.

For counts of piece placements and whatnot, that's where hashes come in handy. You use one 64-bit bitfield for each piece type (00000000 11111111 .....etc... is the initial pawn position, and so forth). This can assist you with quick lookups. But again, since we're talking about a learning tool and not an engine, you can get away with doing slower analysis on the fly, rather than pre-computing redundant data.

However, since a line is defined by certain moves in certain orders, the question of where does one "usually put a piece" doesn't make sense unless you define what you're after more clearly. Do you mean where does the knight usually go including all the sidelines, or where the knight goes once we're out of book? Because if you define one main line by one ordering of moves, then logically each knight only goes in one place: the place corresponding to the move that's defined as part of the line!

For the former, you would have a few attributes defining 1. which opening it is (e.g. French) 2. which variation is it (e.g. Advance, Exchange, Winawer, etc) and then which sideline it is. For the later, you would just parse through a ton of PGNs which are annotated as belonging to the line of interest.

* The board placement portion of it, not the extra info like ply number

camter

Jun 1, 2017

0

#12

One big mother of all databases this one, even if "relational".

Would likely spend a century getting set up.

Sqod

Jun 2, 2017

0

#13

MikeCrockett wrote:

Chessbase provides this capability.

Which capability? Variances, or default unit placement?

I just took a look (I have an account there but I rarely use it) but I just don't see either of those capabilities. In fact, their stats don't even show the individual results for [White, draw, Black], which I've seen on two other online databases, but rather they show only a single stat called "Result," which I'm guessing is only [White].

game_designer

Jun 2, 2017

0

#14

I have designed and built databases for global banks for longer than I care to remember.

I have also written a lot of code for chess...

You basically have 3 types of relational database design: transactions, reporting and consolidation.

Transactions: Low level detail, optimised for inserting, updating or deleting data, usually normalised to 3rd and sometimes 4th normal form, rare to use higher forms. An example: Sales invoices.

Reporting: You take the data from above and GROUP BY on a regular basis to roll up or summarise the data, say done overnight, usually only normalised to 2nd normal form unless using a deep star or snowflake schema. Optimised for reporting or querying data.

Consolidation: I work with this a lot, take reporting from above but add extra steps, consolidation adjustments, this is for banks at the group level, financial disclosure, statutory accounts, group consolidations, blah, blah.

What you need is a combination of the first two.

Start with the reporting design to store positions in a fact table.

You could use a Zobrist hash key to represent each position (game state) and reduce this to a primary key.

I personally however would prefer to use a bit board composite primary key to represent each position and I would reduce this to an auto number.

Then use the transactional design to store the "transactions"

Here is a rough moves table, the games table is a parent table above it, player names, year, result, blah, blah.

move_id 45001 PK (auto number)

game_id 16003 FK (from parent table)

move_number 23 Actual game move number

move_W Nf3 String, actual move (Note 1)

move_B Rc8 String, actual move (Note 1)

state_W 72468559 Bridged FK from fact table explained above for positions

state_B 52673474 Bridged FK from fact table explained above for positions

PK = Primary Key

FK = Foreign Key

Note 1

Could break down into 2 fields, use number for piece, use number for square, this avoids using strings, better for database, data on actual data page, not a reference pointing to other memory.

It's the last two that are most important, the bridge across to the positions table (facts table)

Either represents a Zobrist hash key or a Bit Board Composite Key.

You really need to study this crap man if you want to do this, you can't just slap it together, that is not how databases work.

Sqod

Jun 2, 2017

0

#15

Williamfwm wrote:

No. You never design a database schema this way. I recommend you read up on "normalization". Familiarize yourself with the "normal forms" of a relational database.

I'm already familiar with normalization and normal forms of databases (I have an advanced degree in computer science, by the way), although admittedly I'm out of practice in database design. For those unfamiliar with this topic, here's a good intro:

https://en.wikipedia.org/wiki/Database_normalization

For starters, here's one person who sounds like he's trying to create the same kind of database, and who shows his schema...

https://stackoverflow.com/questions/13305067/databases-for-chess-games

His table called "tblGamerMoves" holds individual moves, although I don't understand his "MoveID" field because he doesn't define it in the chart. That ID and its meaning raise a lot of questions like: Should ...d5 be considered a different move depending on the move number? What if the move number is different but it's a transposition of the same opening setup? If each move is successfully associated with all transpositions of a given opening, what should be done if that move is played in an unnamed opening, or in an endgame?

I don't see there is any practical way to get around putting single moves in individual fields for a versatile opening database. How could a database even give the standard statistics of which move has the best average outcome if you didn't allow the game score to be truncated or split into single moves? Many of the queries I have in mind involve access and comparison of single moves such as: "What percentage of the time in the Najdorf Sicilian does Black play ...Bb7 if White does *not* play Bg2?" "What are the statistics for [White, draw, Black] for financhettoing kingside in a Double King's Pawn Game (1. e4 e5) compared to the statistics for *not* fianchettoing in a Double King's Pawn Game?" These are useful but fairly complex queries that involve looking at either player playing certain *types* of moves at any time throughout the game, in combination with considering the moves that both players have already made at some time in the game. Obviously a great deal of search would be necessary to answer such simple questions, and single moves would need to be accessed and compared extensively. By analogy, imagine a database of addresses where you were mostly interested in only the numerical part of the address: you'd have to create a function to extract that part of the field constantly. That would be very inefficient.

At the moment I'm at a loss for coming up with a schema that would facilitate that process, and so far I haven't found much detailed info on the schemas used by online chess databases. Most discussions I've found so far online are about the annoyance of using PGN for storing games in a database--entire games as single entities, not their moves--presumably because the questions that most often interest chess players are questions about the games as entities, like "Did Alekhine ever play Rubinstein?" or "What was Fischer's overall score against Geller?" I'm mostly interested in single moves, and their associated openings and plans.

game_designer

Jun 2, 2017

0

#16

I edited my post above to include move_number in the table design.

Just forgot it the first time.

Williamfwm

Jun 2, 2017

0

#17

I don't think it's as hard as you're making it out to be. Things like fianchetto detection need to be hand-coded rules you implement as you need them. You can store this information alongside as you go. This shouldn't be solved at the database level. Trying to make a do-everything Uber Schema to accommodate everything you could ever want is a serious design mistake.

FM Kretinovich

Jun 2, 2017

0

#18

Sqod wrote:

Does someone here have good ideas of how to create an opening database using a relational database? I've been unable to find specific information on this.

One reason I want to do this is to get a computer to generate automatically certain statistics, especially variances and counts of piece placements on given squares, that are too lengthy to do by hand. To my knowledge no move database does such operations. I would also like a database that consistently detects transpositions, which some don't. In general, I want to do things that aren't possible with commercial databases or free databases. Clearly I need to directly use database queries instead of relying on the limited modification features allowed by commercial software that prevent direct database access (e.g., Improve Your Chess with Text Annotations in Chessbase; https://www.youtube.com/watch?v=NaMcb041dNY ). I'm thinking of playing around with the free database OpenOffice Base.

To implement an opening database it appears to me that the programmer will need to do at least the following:

(1) Put each ply (e.g., e4) into a separate field.

(2) Create each record based on numerous fields, where each field is one ply of the opening (e.g., 1. e4; 1...c5; 2. Nf3; 2...d6; etc.), which will require at least 40 ply, since some openings extend for at least 20 moves. I don't know what the primary key would be, however.

(3) Convert each record into a single number or string that describes the position, so that any record generating the same position string will be recognized as the same position, and therefore as a transposition of the same opening. This is not a problem since there is a lot of information on this--it's usually done by hash tables...

https://chessprogramming.wikispaces.com/Repetitions

...but I'd want to create my own scheme.

Each of these intentions incur a number of programming problems that need to be solved to work around the limitations of relational databases. I looked briefly at hierarchical databases, which are tree-structured databases, as an alternative, but there is little information on those, and many negative conclusions about their performance and flexibility (e.g., Hierarchical & Network Database Models;
https://www.youtube.com/watch?v=1Wnt1XIYMoM&t=1071s).

Sqod wrote: Does someone here have good ideas of how to create an opening database using a relational database? I've been unable to find specific information on this.One reason I want to do this is to get a computer to generate automatically certain statistics, especially variances and counts of piece placements on given squares, that are too lengthy to do by hand. To my knowledge no move database does such operations. I would also like a database that consistently detects transpositions, which some don't. In general, I want to do things that aren't possible with commercial databases or free databases. Clearly I need to directly use database queries instead of relying on the limited modification features allowed by commercial software that prevent direct database access (e.g., Improve Your Chess with Text Annotations in Chessbase; https://www.youtube.com/watch?v=NaMcb041dNY). I'm thinking of playing around with the free database OpenOffice Base.To implement an opening database it appears to me that the programmer will need to do at least the following:(1) Put each ply (e.g., e4) into a separate field.(2) Create each record based on numerous fields, where each field is one ply of the opening (e.g., 1. e4; 1...c5; 2. Nf3; 2...d6; etc.), which will require at least 40 ply, since some openings extend for at least 20 moves. I don't know what the primary key would be, however.(3) Convert each record into a single number or string that describes the position, so that any record generating the same position string will be recognized as the same position, and therefore as a transposition of the same opening. This is not a problem since there is a lot of information on this--it's usually done by hash tables...https://chessprogramming.wikispaces.com/Repetitions...but I'd want to create my own scheme.Each of these intentions incur a number of programming problems that need to be solved to work around the limitations of relational databases. I looked briefly at hierarchical databases, which are tree-structured databases, as an alternative, but there is little information on those, and many negative conclusions about their performance and flexibility (e.g., Hierarchical & Network Database Models; https://www.youtube.com/watch?v=1Wnt1XIYMoM&t=1071s). just go get chessbase

game_designer

Jun 2, 2017

0

#19

For your positions table (facts table) you want to avoid using a Zobrist hash key.

It's fine when you write code for chess to detect repetition of position, you will probably never have hash key collisions.

The problem however when you use it in a database is that you may have millions of games in the database and you can not be certain that no hash key collisions exist across different games in the database.

A Zobrist hash key basically collapses (or reduces) game state to a single UINT64 (unsigned 64 bit number)

To ensure data integrity just represent game state as a composite key, each value in the facts table below is a UINT64 data type.

state_id PK (auto number)

player_id Player to move, W or B.

castle_rights Permanent castling rights

passing_rights En passant capture square

WK

WQ

WR

WB

WN

WP

BK

BQ

BR

BB

BN

BP

state_id is the bridge to the moves table (transactions)

The next 3 are part of the game state so must be included, for example you can have the same position but different player to move or different castling rights.

state_id is the primary key

from player_id to BP (Black Pawn) inclusive is the table key, a unique composite index across all those fields.

You will need to read up on Bit Boards used in chess programming to understand all this crap.

You would need to write code to parse a PGN file to calculate all that crap, update the facts table, blah, blah.

game_designer

Jun 2, 2017

0

#20

You need to be careful which database you use, it must support the ulong (UINT64) data type and must support bit operations using bit functions in SQL.

I recommend MySQL it supports that.

Forget about say SQL Server, it is not supported.

Then jump ahead in the future and you have the database and an interface like on this website using Menu | Learn | Analysis

You drop pieces on the board and the program creates the SQL, say drop white pawns on f2, g3, h2 and a white bishop on g2, black to move, no castling rights.

SQL looks in facts table first and uses bit operations to return a list of all state_id's that match the criteria.

Can then use this list to bridge across to the transactions table (moves) and could work up to the games table from there, etc, etc.

You now have the proper foundation and all the data is in a database, so you could then use SAS for all the fancy data analytics to determine patterns, trends, blah, blah, just like the banks do with Risk in all their big database systems.

Note that if you split the moves, like Nf3, as I suggested above then every value in every data field is a number, pretty cool huh.

Warlord