Data analysis: Difference between Male/Female ratings - Chess Forums

u0110001101101000

Jan 12, 2016

0

#1

They look unexpectedly close (for all the talk we sometimes have on these forums).

Maybe a little more detail would be useful? For example I can't tell if the gap is 100 points apart or 200. It would also be interesting to know how many rating make up each graph. For example how many female FIDE players are over 45?

The bell curve is unexpectedly slender for women in some... as you noted a high concentration. I wonder what some explanations of this might be.

paperplanes1

Jan 12, 2016

0

#2

[COMMENT DELETED]

wilford-n

Jan 12, 2016

0

#3

My thoughts are that, at the level of GMs and above, these graphs justify Nigel Short's controversial statement about female chess players. While the graphs appear close, look only at the segment above 2500, and remember that the area under the graph represents the total percentage of the population who has a GM rating. At all age ranges, that area is significantly higher for men than women. Overall, the percentage of men reaching 2500 or above is around 10 times that of women. (It's hard to estimate from such low-resolution graphs.)

It's important to note that we're talking about percentages of all chessplayers by gender here, not simply raw numbers, which skew the data even more strongly in favor of men. Currently, there are only 33 women who have earned the full GM title, compared to around 1400 men who currently hold the title. Put another way, there are over 40 male GMs to each female.

It may not be politically correct to state these facts, but the simple truth is that statistics don't lie (contrary to certain popular expressions about "lies and damn lies"). We may argue about why such a gender gap exists, but we can't use a few anecdotal examples to pretend it isn't really there.

InfiniteFlash

Jan 12, 2016

0

#4

0110001101101000 wrote:

Maybe a little more detail would be useful? For example I can't tell if the gap is 100 points apart or 200.

It would also be interesting to know how many rating make up each graph. For example how many female FIDE players are over 45?

I am unsure if I can extract a rating gap (the difference between the percentages of Male and Female plotted). I will try to see if I can store each percentage as a value and calculate the residuals.

These are the data set sizes I have stored for gender then partititoned by age group.

The reason FIDE_only_female and FIDE_only_Male have less combined total observations than FIDE_population is because these two data sets only include Standard Rated players. There are about 340000 players that do not have standard ratings.

Elubas

Jan 12, 2016

0

#5

Yeah I mean I think this is a counter to the idea that the gap is all about participation rates. If the concentration were to stay like that and then eventually you get an equal amount of female and male players, you would still have fewer females reaching the top level of chess. There might be other explanations, but the participation argument just doesn't seem strong, at least as a standalone idea. It's too vague.

watcha

Jan 12, 2016

0

#6

These figures look familiar to me for some reason.

chessfreak

Jan 12, 2016

0

#7

Males are superior. Its that simple.

InfiniteFlash

Jan 12, 2016

0

#8

Deleted.

watcha

Jan 13, 2016

0

#9

As I said, these figures look familiar to me. The reason that I myself made an attempt to process the Fide players list data, just for curiosity. For this purpose I have developed a program which creates detailed statistics based on the XML file that can be downloaded from Fide ( http://ratings.fide.com/download.phtml ). I was also interested in female vs. male performance and participation, so the files generated contain this information.

Yesterday I did not have time for this, but today I looked into it, because I'm also interested in the historical development of female participation. Unfortunately the XML data is only available since August 2012.

In August 2012 the stats for all players that have a standard rating looked like this:

In December 2015 the same stats:

As you can see even in this short period of three and a half years the female participation has increased from 8.61% to 9.79%, more than a full percentage point tise.

It is kind of a hassle to rewrite my program to parse the TXT files which are available since 2001, but I'm looking forward to the results.

watcha

Jan 13, 2016

0

#10

InfiniteFlash wrote:

I'll make sure to take a look at historical data given by fide.

Please look into the the standard deviation of rating.

Higher extreme values among men are explained ( at least partly ) by the supposed higher standard deviation of ratings. The common wisdom is that men and women have on average equal abilities, however men having a greater standard devitation, produce higher ( and lower ) extreme values.

To be honest I don't see this higher standard deviation in my files. No matter how I look at it, men don't have significantly higher standard deviation. I may make a mistake, so I would welcome if a statistician looked into this issue, and compute the standard deviations.

The main focus group I'm interested in is what I call middle age active population ( 20-40 years old, active ). Note that not all players on the list are active. Passive status is indicated in the 'flag' field of the records with "i" for inactive men and "wi" for inactive women. Also not every record has a birthday, so you also take care of missing age.

In the middle age active group I found only a tiny bit higher male standard deviation which seems statistically insignificant. I would welcome any independent confirmation or rejection of this result.

watcha

Jan 13, 2016

0

#11

Stats for middle age active population December 2015 list:

Male standard deviation is 308.2 vs female 303.82, practically the same.

wilford-n

Jan 13, 2016

0

#12

watcha wrote:

Male standard deviation is 308.2 vs female 303.82, practically the same.

Actually, a difference of 1.4% in standard deviation is very significant, especially when you're looking at the top and bottom ends of a normal distribution. Let's look at the effect of the standard deviation alone on the numbers of players reaching 2500 or above. We'll neglect the difference in average rating by gender and look at the effects of standard deviation alone. So we'll simply use the average rating value of 1873.30 as a starting point.

For men, 2500 is 2.0334 standard deviations above the mean. This puts 2500 at the 97.896 percentile, or 2.104% reach GM rating (in this idealized Gaussian model). For women, 2500 is 2.0627 standard deviations above average. This puts 2500 at the 98.040 percentile, or 1.960% reach GM rating (again, in our simplified model).

By this analysis, men are 7.35% more likely to reach GM level than women... and that's ignoring the 50+ point difference in the mean for men and women. If we include the different averages in the model, the percentage above 2500 changes to 2.176% and 1.325% respectively, meaning men are 64.2% more likely to reach GM rating than women.

This tells us two things: First that the difference in standard deviations is a significant factor, and second, that the difference in the mean is a greater factor. The inescapable conclusion (based on a flawed Gaussian model) is that men are indeed better chessplayers than women, both as a whole and at the highest levels of competition. Discarding the model and looking at the actual distribution actually strengthens this conclusion, as the double-maximum on the female rating distribution artificially increases the standard deviation, making it an unreliable tool when you stray very far from the mean.

InfiniteFlash

Jan 13, 2016

0

#13

Deleted.

IAmAquarius

Jan 13, 2016

0

#14

Always gotta be careful with statistics. Today people in live chat were trying to tell me that being a good chess player irl had nothing to do with being a good player on the internet I.e. Chess.com because there was no statistical rating correlation.

Always gotta be careful with statistics.

InfiniteFlash

Jan 13, 2016

0

#15

Guys, I screwed up hard, please see post #16 for my update.

InfiniteFlash

Jan 13, 2016

0

#16

Deleted.

SilentKnighte5

Jan 13, 2016

0

#17

So the participation for women went up ~1% and the average rating went down 200?

watcha

Jan 13, 2016

0

#18

Actually I have a little GUI app that generates charts out of the collected data.

I have four main categories for filtering for age and activity: 1) none ( all players ), 2) middle age ( 20-40 years old ), 3) active ( players not having an 'i' in their flag ), 4) middle age, active ( both middle age and active ).

Here are the rating distribution charts for the four categories:

1) none

2) middle age

3) active

4) middle age, active

watcha

Jan 13, 2016

0

#19

SilentKnighte5 wrote:

So the participation for women went up ~1% and the average rating went down 200?

The 1 % point rise pertains to all players. The reason behind the rating drop is that female participation is increasing at very young ages, and understandably the expected rating at young age is very low. This makes the average to go down with higher participation.

Female participation in the function of age:

watcha

Jan 13, 2016

0

#20

Here are is the comparison of middle age, active stats between 2012 and 2015:

August 2012, middle age, active:

December 2015, middle age, active:

As you can see the rating drop is less marked here, and also the male average rating have dropped, even more than the female average rating. The reason is that many young players come in, and they don't reach their peak rating till the age of 30, pushing even the middle age, active average down.

Age distribution of rated players: