Hi,
I have been experiencing an abnormal amount of server errors. They happen about a quarter of the time I get on Chess.com. They used to happen very rarely. Is there anything I can do or can you fix it.
Thank you,
theegreentree
Hi,
I have been experiencing an abnormal amount of server errors. They happen about a quarter of the time I get on Chess.com. They used to happen very rarely. Is there anything I can do or can you fix it.
Thank you,
theegreentree
Hi,
I have been having a lot of server error issues when I get on the website. Definitely more than normal. It happens about every one in four times when I used to happen very rarely. I there any steps I can take to reduce the server errors or are you able to fix it?
Thank you,
theegreentree
The site is experiencing really high loads and it's a site-wide thing. Staff are actively working on optimizations and capacity increases to alleviate the issues.
Ah, I see. Thanks for letting me know. Having so many too many people go on a site usually is a good sign as means people love your website. Thank you for making Chess.com a better place!
I don't accept the excuse - it is not plausible that there has suddenly been an increase of a magnitude that could not be anticipated using common sense. It doesn't fit the facts either - since usage is very bursty, there should be occasional glitches due to the occurrence of peaks in activity unknown in the past. Instead, there are persistent very common errors all the time: the system is not fit for purpose.
It's all part of a general degradation in the quality of the site over time. For example, for a long time, when you posted a new post in a forum, you would then see the forum with the new post. Now, due to some combination of poor performance and bad code, the forum is displayed with the post missing. One refresh sometimes makes the new post appear. Sometimes it takes two refreshes or more. This sort of poor performance is essentially unknown on the modern Internet except on chess.com. It would be really good if someone with influence considered why this is and did something about it (seems a desperate wish).
I feel being over-polite about this has been inappropriate for some time now.
Please be patient chess.com IT people are on the deck and work hard, new servers need to be put in service and it will take few weeks.
I don't accept the excuse - it is not plausible that there has suddenly been an increase of a magnitude that could not be anticipated using common sense. It doesn't fit the facts either - since usage is very bursty, there should be occasional glitches due to the occurrence of peaks in activity unknown in the past. Instead, there are persistent very common errors all the time: the system is not fit for purpose.
It's all part of a general degradation in the quality of the site over time. For example, for a long time, when you posted a new post in a forum, you would then see the forum with the new post. Now, due to some combination of poor performance and bad code, the forum is displayed with the post missing. One refresh sometimes makes the new post appear. Sometimes it takes two refreshes or more. ...
The site certainly has been seeing increases in traffic for a few weeks and most of the issues are occurring during normal peak times. It's that the peaks are a lot higher.
My understanding is the reason for the higher peaks isn't really know and likely is a combination of a lot of different things. Similar to the pandemic and Queen's Gambit spikes, it's an unexpected increase, higher and more sustained than planned for loads and growth.
During higher loads, submissions, such as forum posts, are queued and asynchronously written to the DB. So when things are overloaded, you can see a delay between posting and being able to view the post.
It's also possible reads are load shared across replicated database instances and there are multiple webservers that could be reading from a DB that doesn't yet have the replicated database. I don't have much insight into the site's architecture, just how it's possible to handle loads and spread connections, so I could be off some in the explanation.
For several months (but probably less than a year) there has been a delay in posting long enough to mean that the forum is consistently displayed without the new post. It worked for many years before that.
If anything, there should generally be economies of scale - providing a similar service to a larger membership is more economical. Where is another company that has not successfully avoided such problems?
For several months (but probably less than a year) there has been a delay in posting long enough to mean that the forum is consistently displayed without the new post. It worked for many years before that.
If anything, there should generally be economies of scale - providing a similar service to a larger membership is more economical. Where is another company that has not successfully avoided such problems?
I know that asynchronous writing to the DB has been in place for a while. It's a mechanism to spread out load in the databases. Unfortunately, during high loads, it certainly becomes more noticeable.
As mentioned, I don't know the current architecture, but splitting content areas into their own databases and servers is likely planned, if it's not currently being done. Scaling ends up needing to be done by splitting out services.
@erik has posted before about how early design decisions can cause unexpected growth pains later, such as what's happening now. My understanding is that effort has been going towards resiliency since the end of the previous spikes too. This recent traffic was unexpected.
The last comment does suggest some recognition of the source of the problem and what needs to be replaced/migrated. I don't claim expertise, just some common sense and limited experience, but there are people who could be usefully consulted.
"I feel being over-polite about this has been inappropriate for some time now."
I'm coming to agree with this. About 10% of all my games are being aborted on a regular basis since late December or so, or worse, sometimes the server hiccup resolves fast enough that the game isn't aborted and I've lost meaningful clock time. A single tweet 10 days ago merely acknowledging that there's a problem with no follow-up information isn't satisfactory to make we want to stick around and pay them for a subscription indefinitely, simply hoping that someday the core game-playing functionality of the site starts working again. I could speculate that these things take time to fix, but it's not my job to speculate here.
More information through official channels please, ideally with an expected timetable for a resolution.
I like playing puzzles on my iPhone. I frequently have to stop because I get either “our servers made a dubious move” or a similar error (which causes the puzzle to repeat over and over. Very annoying - especially as a subscriber
...or still-worse: in my last 3 games this morning:
1. winning position, abort due to server error
2. totally legitimate loss with no server errors ;p
3. overwhelmingly winning position, server error and no access to the game to make a move, 15 minutes later the server comes up and marks it as a loss on time because i had the move when the server went down.
Yeah I'm done here.
Yeah, I lost because of a sever error that chess.com said was their fault. I do understand that server errors might happen if lots of people visit thier site. I just hope it is fixed soon though.
"I feel being over-polite about this has been inappropriate for some time now."
I'm coming to agree with this. About 10% of all my games are being aborted on a regular basis since late December or so, or worse, sometimes the server hiccup resolves fast enough that the game isn't aborted and I've lost meaningful clock time. A single tweet 10 days ago merely acknowledging that there's a problem with no follow-up information isn't satisfactory to make we want to stick around and pay them for a subscription indefinitely, simply hoping that someday the core game-playing functionality of the site starts working again. I could speculate that these things take time to fix, but it's not my job to speculate here.
More information through official channels please, ideally with an expected timetable for a resolution.
The site is actively working on optimizations and capacity increases to alleviate issues. It takes time to make major changes but my understanding is that work is their primary priority.
How much time (roughly), and where should I look for ongoing official announcements on the issue?
I don't think there is an official ETA and there are multiple efforts in progress to assist in reducing issues. Each likely has different potential times to complete.
I would assume that once things are more stable, die to some of those efforts, there might be an article/news item released, but I don't know that sure
So they will be issuing refunds to those who paid for premium memberships and can't access the site due to these server errors, right?
....right?
So they will be issuing refunds to those who paid for premium memberships and can't access the site due to these server errors, right?
....right?
I don't think anything has been stated but the site's #1 priority is getting things stable and working in keeping things stable even with additional growth.
Once things are stable, they should be able to decide what they're going to do and will have a better idea what the member impact was; for example, how long.
https://www.chess.com/blog/CHESScom/chess-is-booming-and-our-servers-are-struggling
If you need help, please contact our Help and Support team.
Hi,
I have been having a lot of server error issues when I get on the website. Definitely more than normal. It happens about every one in four times when I used to happen very rarely. I there any steps I can take to reduce the server errors or are you able to fix it?
Thank you,
theegreentree