Alpha Zero




I don't think it has, for example, a variable somewhere where it says a knight is worth 3 pawns. You could maybe infer a value by how it plays. I don't think you can set it.


Deep Mind did another iteration where they didn't put any rules information into the algorithm and according to the paper, the versions got just as strong as the previous ones that had that information. Though it didn't make any insights on playing style as I recall.

Yes, DeepMind's updated version of AlphaZero was "MuZero".
It mastered chess (and many other games) without knowing the rules of the game at all.
According to the company's research, it "matched" AlphaZero's playing strength after one million self-training steps.
Though some might find it interesting that MuZero's Elo charts showed an upward vertical trend, compared to AlphaZero's Elo line.
In Go and Atari, MuZero's Elo soared past AlphaZero's.
In Shogi, MuZero's Elo seemed to equal AlphaZero's.
In Chess, MuZero showed a very slight upward trend, just barely passing AlphaZero's Elo line at one million training steps.
https://xlnwel.github.io/blog/images/application/MuZero-Figure-2.png
They seem to have stopped it there, satisfied that MuZero matched AlphaZero's strength. But it seems, if MuZero had been allowed to continue, its chess Elo would have continued to rise at a slow, steady rate.
You can read more about it here:
https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules
https://www.nature.com/articles/s41586-020-03051-4.epdf?sharing_token=kTk-xTZpQOF8Ym8nTQK6EdRgN0jAjWel9jnR3ZoTv0PMSWGj38iNIyNOw_ooNp2BvzZ4nIcedo7GEXD7UmLqb0M_V_fop31mMY9VBBLNmGbm0K9jETKkZnJ9SgJ8Rwhp3ySvLuTcUr888puIYbngQ0fiMf45ZGDAQ7fUI66-u7Y%3D
For whatever reason, not knowing the rules of the game led to MuZero finding improvements over AlphaZero.

Alpha Zero learned by playing itself over millions of games and playing with a certain style that Kasparov described as being similar to himself and Tal, a sacrificial style (is not completely accurate to say Alpha sacrifices, as Garry points out).
So one question that occurs to me early, is it possible Alpha Zero would play with different style if it starts from scratch, or would millions of games always lead to a similar knowledge base?
If you had 2 Alpha Zeroes play each other that had developer independently, would they split exactly 50/50, or is it possible one develops a style superior to the other and dominates?
What do you think?