Glicko-2 rating system technical question

Sort:
Twixtfanatic

Looking at the description of Glicko-2 at http://glicko.net/glicko/glicko2.pdf 

Before calculating anyone’s rating, ratings deviation RD, or volatility sigma, a value Tau needs to be determined, which “constrains the change in volatility over time.” Different abstract games may use different values of tau.

 The Little Golem server features lots of abstracts and has attracted many strong players of these games, relatively speaking. For an obscure game, 100 players is a lot. The Elo system is used there, which IMO is probably the best choice with under 1000 players, because of greater transparency without losing much if any accuracy.

I’m just curious about how this value tau should be arrived at. LG has a large database of game outcomes, and everyone’s rating at the start of a tournament is listed. Also each player has their own graph of rating over time. I get that such data could be used to see what effect different values of tau would have on the overall predictive accuracy of a Glicko-2 rating system. But I’m unclear on how accuracy should be measured. Maybe just browsing a sample of ratings graphs would provide a reasonable estimate of tau for a specific game, but I have at best a vague idea of how to do that.

Thanks for any clue.

Heh maybe I should ask professor Glickman?

notmtwain
Twixtfanatic wrote:

Looking at the description of Glicko-2 at http://glicko.net/glicko/glicko2.pdf 

Before calculating anyone’s rating, ratings deviation RD, or volatility sigma, a value Tau needs to be determined, which “constrains the change in volatility over time.” Different abstract games may use different values of tau.

 The Little Golem server features lots of abstracts and has attracted many strong players of these games, relatively speaking. For an obscure game, 100 players is a lot. The Elo system is used there, which IMO is probably the best choice with under 1000 players, because of greater transparency without losing much if any accuracy.

I’m just curious about how this value tau should be arrived at. LG has a large database of game outcomes, and everyone’s rating at the start of a tournament is listed. Also each player has their own graph of rating over time. I get that such data could be used to see what effect different values of tau would have on the overall predictive accuracy of a Glicko-2 rating system. But I’m unclear on how accuracy should be measured. Maybe just browsing a sample of ratings graphs would provide a reasonable estimate of tau for a specific game, but I have at best a vague idea of how to do that.

Thanks for any clue.

Heh maybe I should ask professor Glickman?

Doesn't that linked paper suggest values for Tau? (In step 1)

Twixtfanatic

It provides a range of values.

”Reasonable choices are between 0.3 and 1.2, though the system should be tested to decide which value results in greatest predictive accuracy. Smaller values of τ prevent the volatility measures from changing by large amounts, which in turn prevent enormous changes in ratings based on very improbable results. If the application of Glicko-2 is expected to involve extremely improbable collections of game outcomes, then τ should be set to a small value, even as small as, say, τ = 0.2.“

This does not tell me how to measure, or “score,” a specific value of tau for accuracy at predicting game outcomes, when applied retroactively over a database of game outcomes which spans several years. Nor does it quantify this concept of likelihood of sets of improbable outcomes.

Twixtfanatic

Of course, I could score a specific value of tau any way I want. For example, suppose each rating period spans one month. Call each game where the higher rated player lost at least one rating point an error game. Sum the absolute values of rating points lost in error games during each month, and divide by the total number of games completed that month to arrive at a monthly error score. (A lower score is better.) Or instead of a plain sum, take the square root of the sum of squares of points lost. The overall score for a specific tau might be the average monthly score over the last 6 months. A secondary criterion might be how quickly the monthly error reduced after the first month or two.

Which all sounds wonderful, but I’m just pulling this out of my, uh, hat. I’m no statistician. I seek guidance! Thanks for your time.