Update...
Per private correspondence, one person here confirmed my suspicion was correct: relational databases are absolutely the wrong tool for what I want to do, so I gave up on this project... sort of. I'm still planning to build a very simple smaller database for experimentation (such as for holding structured data and positional attributes of interest) but I'm not going to even try to get a conventional database to do tree searches. The headaches and contortions are pointless. I'd like to start a blog here, if I can ever justify the time, and probably my first article will be about some insights I got from trying to force a conventional (= relational) database to do what I want with chess moves. Maybe someday I'll find some software tool that is focused on search of trees or directed graphs, which I might be able to modify for chess needs, but that or writing my own graph-searching software are about the only things that would motivate me to tackle such a project again.
Follow-up...
In case I don't have time to post a more technical discussion today, here are the three stories I was going to tell about my experiences of people needing a graph analysis tool instead of a database:
(1) In 2003 I was on loan to a company that was working on a project to analyze communication patterns. They showed me a database of communications where each entity initiating or receiving the communication was identified only by a several-digit code number, along with the times the communication started and stopped. I was told that the actual data was classified, but that my general work and upcoming report was not. They told me they wanted to know what patterns I could find in the data. That was a fun project since it was so open-ended. It was as if they didn't even know how to approach the problem, and they were basically telling me "Tell us anything you can about this mess of data."
I began by asking myself obvious questions that related to the implied graph itself: Who was initiating the most communications? Graphically this would correspond to the node with the most *outgoing* links. This roughly corresponds to the concept of a "hub" in network science (https://en.wikipedia.org/wiki/Hub_(network_science_concept)), although the Wikipedia definition of this does not consider whether such links are outgoing on incoming. Who was receiving the most communications? Graphically this would correspond to the node with the most *incoming* links. Was everybody in a contact line with everybody else, meaning a message starting from one entity could have been broadcast to everyone, or were there disconnected groups that never had communication outside of their circle? Graphically that would correspond to independent, non-intersecting graphs. All such SQL queries were relatively easy to do.
The company had a SQL expert on whom I relied for the more difficult queries, especially those that required subqueries. Toward the end of my project I began coming up with more ideas relating to network topology, such as cycles, and also the concept of periodic patterns, such as a daily broadcast from a single entity. At this point the SQL expert balked, and said those would be very difficult queries for SQL, which to me suggested he didn't know how to do them. That made an impression on me: that SQL doesn't work well with graph-type domains that require following nodes around to determine network topology. In other words, SQL was the wrong tool for that job.
(2) Around 2011 a business I passed in La Jolla, California caught my eye because its name suggested it did software work with semantics and textual data, which was odd partly because that was the type of work I knew well, and partly because all the businesses in that area, which were only two blocks from the beach, catered to affluent tourists: Lamborghini and Porsche dealerships, restaurants, tuxedo stores, hotels, etc., so their business really seemed incongruous, suitable more for some tech center like in Silicon Valley. I went in to inquire about their business and found out that--sure enough--they did software work related to A.I. because their software system used Hebbian learning as biological neurons do. They were using Hebbian learning to analyze text, since Hebbian learning could cluster text.
I was familiar with Hebbian learning due to my A.I. background. In Hebbian learning a weight (w)--a single value--becomes associated with the link between every two nodes through repeated exposure (statistical learning). The man told me they might have work for me but that they were waiting on a "big check" that was expected soon from some big investors. Most typically he said their current work was in reading their clients' data files and converting it to an XML format so that the company's software could operate on it, so that is the work for which they could initially use me. If their software could find the types of patterns for which a perspective client wanted that would lead to a customized version of their software that they would sell to the client.
He told me that their company had been approached once by a military person to find out if the company's software could help analyze some complicated data that the military had. It was another case of "We have this mess of data. Can you tell us anything about it?" Unfortunatey, the man telling me this said their software was not very well tailored to such a general task, so when run with the data the company was supplied it did not satisfy the potential client's requirements.
Here is a rough visual comparison of the fundamental component (the top diagram) of a Hebbian learning system compared to the fundamental component (the bottom diagram) of a directed graph system:
Hebbian learning looks much simpler, doesn't it? It is, and I knew that a system based on that simple structure (no directionality, with limited data, and only simple numerical data) would be too simple to produce anything powerful enough to solve difficult problems like those in my bulletized list in my previous thread, all of which relied on directed graphs. I didn't say anything because I wanted the job, and besides I figured I could talk to their inventor into adding complexity if it were needed in the customization.
From what I just told you, most of you can figure out how the rest of the story: Their "big check" never arrived, they stopped responding to my e-mails, and soon I saw that their store disappeared, all of which suggests they went out of business, ultimately partly because their software could not tackle general graph analysis problems that people needed to solve.
(3) In 2010 I worked on a commercial project for prediction of heart attacks. Given the standard, non-invasive body measurements that doctors routinely gather--namely blood pressure, heart rate, and body temperature--our task was essentially to use data from known heart attack victims to predict whether they were going to have a heart attack. As you can imagine, this was an extremely difficult task. After all, if it were that simple, somebody likely would have already figured that out long ago and been using it by now.
One of the key problems for us was that there was no publicly available model of the human body, so nobody, not even doctors, could tell us what would happen if one value rose while another fell, for example. Without some mathematical model the data could not be mapped to a diagnosis, and statistical methods, even A.I.-based, seemed to be at a loss, probably because there were no evident patterns (plus we figured we needed more data from more invasive techniques). Eventually we found out that there did exist some makeshift models in the form of artificial human bodies on which doctors trained, but such state-of-the-art training aids and the programmers who programmed them were unknown and unavailable to us.
Our software people discussed the problem and I lamented that no general system analysis software seemed to exist whereby an engineer could just attach sensors to a given system--whether an aircraft component, human body, or 3-variable math equation--and have the software figure out how it worked based on empirical evidence, and maybe propose a model. This seemed yet another case of "We have this system that can provide us data, but how can we interpret patterns from it?" It didn't seem that hard to produce such a system, but maybe it was. Even if the human body had some component analogous to a capacitor in an electrical circuit that delayed the effect of a change, it seems that its presence could be detected or at least hypothesized by looking at enough sensor readings over time.
Anyway, those are my stories. In just three events the picture that seems to be emerging is: (1) Not even the military has a software system that can tackle the general problem of analyzing directed graph data, which suggests that no such software exists (although these experiences were admittedly at least several years ago); (2) Such a software system is much needed in every field, whether in chess, medicine, or warfare.(3) SQL databases are the wrong tool for the job, especially in chess move analysis.
Of course I could be wrong about all this, but this set of stories is party why I'm getting pessimistic about a database doing what I want for chess. Anyway, enough on that conjecture. Later I'll post more specific examples of the types of information I want from chess game analysis, and see if anybody can figure out the more concrete, down-to-earth problem of supplying a SQL statement that will accomplish that.