Post 2 - My understanding of Chess.com´s HTML code.
If I understand correctly the chess.com moves are displayed in a <div id=moveList_vertical> container. Inside the container there is a <span> per every move.
The <span> has an id with the number of the move its representing.
For example, 1. e4 would be <span id="movelist_1">, 3.Bb5 would be <span id="movelist_5">
And inside that span there is an <a> tag with a very similar id which contains the move itself.
"Sf3" is Nf3
Im homeless and need @erik to hire me. To prove myself worthy I will try scrape chess.com
My goal is to get the PGN of a game. I know you can just download it but no, I do it the hard way.
This is the chess.com game that I will try to scrape. The first game ever on chess.com servers!
https://www.chess.com/daily/game/1
I will be using Java and a library called JSOUP. I still have my failed attempt at scraping on Netbeans. It uses JSOUP to gather the HTML document from the webpage and then tries to extract the PGN from it.
It seems like I entered a depression episode and deleted all the code associated to getting the PGN tho so I will have to rewrite that. The very simple code to get the HTML document is working. It looks like this:
And what I get looks like this when I tell Java to print it out:
Wait for my next post where I explain my understanding of chess.com´s HTML structure and how I will try to scrape it!