More efficient way of obtaining a user's most recent games? Proposed fix if not

Sort:
ck_russ

I'm developing software that needs to access a user's most recently completed game. I see that I can get the games in near realtime using e.g. : 

https://api.chess.com/pub/player/ck_russ/games/2019/02

But this something I probably want to poll every 10 seconds or so potentially from numerous players since users may want to also use the software on games played by their friends or favorite players. As each archive grab could easily grow to thousands of games it seems like this would be a drain on both chess.com's resources, as well as that of my users. Is there any other more efficient way to obtain recent completed games?

If there isn't I'd propose an API endpoint along the lines of: api.chess.com/pub/player/ck_russ/games/after/TIMESTAMP which just returns the 0 or more games played and completed after TIMESTAMP.

 

skelos

If I were designing such a thing with the current endpoints, I'd wait for a user to request:

1. Latest game list, or

2. Analysis of a particular game.

 

Unless you really want/need your analysis to be available before someone asks, that would only hit api.chess.com when some data is wanted.

You don't say in any detail what type of analysis you intend to do, but:

1. chess.com offer in-browser analysis (via Stockfish.js) which is near-instant but more limited for free members

2. Download of any game which can then be fed to an engine.

 

If you're doing something like #1, chess.com has chosen that as one of their encouragements for people to move up to premium membership.

ck_russ

I don't think the details of my software are terribly important. In your suggestion you end up with the exact same problem. You have to populate the list of games. So how to do this in something vaguely real time (e.g. - 10 seconds)? Having a user have to keep clicking refresh is just too 2010. tongue.png

skelos
ck_russ wrote:

I don't think the details of my software are terribly important. Even in your suggestion you end up with the exact same problem. You have to populate the list of games. And having a user have to keep clicking refresh is just too 2010.

Let me speak plainly, then I'll shut up:

1. The nature of your software is important, as if it undermines one of chess.com's "premium" features why should they help you with it?

2. Within the current constraints of a read-only, fixed data (for caching) endpoints and realistic load limits you dismiss something that might, you know, work for both you and chess.com as "too 2010".

 

I'd suggest growing up. You ask a question and then deride the answer and indirectly the person who answered you. You sound like you were born about 2005. If you can't do the math, that means you're behaving childishly.

WhiteDrake

A lot of developers are concerned with performance of some sort (in this case bandwidth) before they even start coding. I suggest you build your project and return back when you find out your application has some kind of performance problem. If I understand it correctly, you want for the application to basically download the bundle of matches containing the last game, parse the JSON  amd get the data about the latest match, in total under 10 seconds. Is that difficult with the current monthly bundle endpoints?

My advice is to add the refresh button anyway. What if the user of your application is offline temporarily (e.g. a bad wifi or mobile connection)? The application should ideally try to connect with increasing time spans (e.g. try after 8s, then after another 16s, then another 32s, another 64s...); if the user fixes their connection issues, they can hit the refresh button. I’m pretty sure GMail had a similar try-to-reconnect feature in 2010 already, so your project might want that. wink.png

ck_russ

The current endpoint works great. The only issue I was concerned about was the total bandwidth usage.  I haven't played that many  games this month and my archive is currently about 400kb. Let's say a user would like to use the software for a couple of hours a day. As my archive will also grow over the month let's say it hits an average of about 600kb in size.

That's 30days/month * 2hours/day * 60minutes/hour * 60seconds/minute / 10seconds/query * 600kb = 12.96GB/month.

Pretty hefty bandwidth cost for chess games. It's not even in 4k! wink.png Definitely a good point on adding some manual feature/dynamic polling adjustment for users on worse connections or those who'd rather avoid any sort of automatic updates.

andreamorandini

You can poll currently played games https://api.chess.com/pub/player/erik/games and keep track of which ones are in that list. Only when a game disappears from that list you need to hit the game archive endpoint.

 

By doing this you should be able to considerably lower the requests sizes.

ck_russ

That seems like it's something that would definitely be workable. Unfortunately the docs indicate that that's only for daily games. I just tested it to make sure and it indeed did not update while playing a Live game for myself or my opponent.

 

Another API possibility here would be to have a 'tail' option. For instance: https://api.chess.com/pub/player/ck_russ/games/2019/02/tail that would just return the last e.g. 10 lines of the archive. Maybe a little bit hacky, but it would enable either completely static delivery or a very near 0 resource cost server side operation.

WhiteDrake

We’re seeing again and again that it’d useful if the API was consistent in daily and live chess. A few days ago a member here wanted to access live tournaments via the tournaments endpoint (which for some reason doesn’t contain live tournaments). Now it’s the the games endpoint which doesn’t contain live games. Do I see a pattern?

bcurtis

Live chess games are not sharable in any way until they are finished. This has to do with how the data are stored during a game, and won't change for this API. Once a game is finished, all games are treated the same to the best of my knowledge.

Polling a game archive endpoint will not cause server load or bandwidth issues. If you poll faster than every 5 seconds, then our CDN cache will intercept the excess (if you poll super fast, like multiple times per second, they may block you as a denial-of-service attack!). If you poll slower than once every 5 seconds, our server will be contacted; this will typically be to ask if the CDN cache is still valid, and most times this is true. The CDN will either respond with the data or with a "304 Not Modified" response if you are properly sending the "If-None-Match" or "If-Modified-Since" cache control header. Read more: https://www.chess.com/news/view/published-data-api#pubapi-general-caching

We have blocked applications that are trying to harvest data systematically. So long as your application is obtaining data that a player wants you to get (not just "hey, I want to grab some data!"), then you should be fine. Be aware of rate limits (which do not apply to cached content), and supply a user-agent with contact information and we will let you know if your application is causing problems: https://www.chess.com/news/view/published-data-api#pubapi-general-rate-limits and more discussion at https://www.chess.com/clubs/forum/view/rate-limiting

Hope that helps.

ck_russ

That does help. Thanks a lot for all the information there.

Really I think the 304 + If-Modified-Since would be absolutely ideal. Can you confirm that this is working for URLS such as https://api.chess.com/pub/player/ck_russ/games/2019/02 ? I was setting up the header and setting a last modified since to a few minutes before present (and also tried even in the future to avoid any possible time zone issues) and was never getting a 304. I did get a 304 using the same client/headers when connecting to e.g.  https://www.w3.org , which was a site somebody recommended for testing out if modified since implementations.

Just let me know if it'd be useful and I can set some unique agent string if you'd like to cross check the logs.

--

I went ahead and set the user agent. The agent is "Hi this is ck_russ". The if modified since time is set to 1 minute before the time of sending. The page is https://api.chess.com/pub/player/ck_russ/games/2019/02 . I'll do a few more grabs of the page if that might help.

bcurtis

Our server is properly returning the 304 header, but it is possible that the CDN does not consider your script to be capable of handling the dates properly (a common problem in scripting) and so it delivers the full data to you. In my tests, I get a similar result. However, I always get the 304 response when I supply the "If-None-Match" header with the "ETag" value from the previous response. Based on your user-agent, I only see one request about 23 hours ago, which received a "200" response. All the rest will have been "304".

ck_russ

Awesome! Swapping over to the If-None-Match with the ETag works and is a perfect solution - getting delivered 304s as expected! Much appreciated! happy.png Thanks.