Alpha Zero

Sort:

Ubik42

Oct 9, 2021

I just started the book "Game Changer" about Alpha Zero, a fascinating topic for me because I am both a chessplayer and spent much of my career as a programmer.

Alpha Zero learned by playing itself over millions of games and playing with a certain style that Kasparov described as being similar to himself and Tal, a sacrificial style (is not completely accurate to say Alpha sacrifices, as Garry points out).

So one question that occurs to me early, is it possible Alpha Zero would play with different style if it starts from scratch, or would millions of games always lead to a similar knowledge base?

If you had 2 Alpha Zeroes play each other that had developer independently, would they split exactly 50/50, or is it possible one develops a style superior to the other and dominates?

What do you think?

NikkiLikeChikki

Oct 9, 2021

Leela chess zero, which is modeled on the alpha zero papers, has been set and reset dozens of times. It always ends ups playing the same. The law of large numbers comes into play.

Ubik42

Oct 9, 2021

Ah....I suspected it might be that, but it would have been so much more fun if it would develop different strengths and styles. Thanks!

NikkiLikeChikki

Oct 9, 2021

The strengths and styles are developed by tweaking the training algorithm. I believe, though I’m not certain, that you can modify it to value aggressive play or also some other minor things. Honestly, I’m not sure. I just happened to be watching Stockfish vs. Leela in a Twitch chat. They talk about the nerdiest stuff in there that’s way way way over my orange-dyed head, but I do recall someone saying something like that. You can always poke around inside one of those and ask some questions.

Ubik42

Oct 9, 2021

Hmm.... Maybe I will come across that in my reading, but while I can see how you would do this for stockfish by giving it different valuations, it would seem much trickier to do it too Alpha Zero because afaik it doesn't really have valuations in some easy spot to manipulate.

I don't think it has, for example, a variable somewhere where it says a knight is worth 3 pawns. You could maybe infer a value by how it plays. I don't think you can set it.

NikkiLikeChikki

Oct 9, 2021

Like I said, I don’t know. The training goal is “to win.” You could train passivity by saying the first goal is not to lose, and the secondary goal is to win. You can have a training goal to be to win in 40 moves else it’s a loss. This might lead to aggressive play. You make a training goal be “if moves are within x evaluation, use your queen.” There’s lots of manipulation possible.

Ubik42

Oct 9, 2021

Oh, so retrain it from scratch with a different goal? Hmm.

Martin_Stahl

Oct 9, 2021

Deep Mind did another iteration where they didn't put any rules information into the algorithm and according to the paper, the versions got just as strong as the previous ones that had that information. Though it didn't make any insights on playing style as I recall.

Stil1

Oct 10, 2021

Yes, DeepMind's updated version of AlphaZero was "MuZero".

It mastered chess (and many other games) without knowing the rules of the game at all.

According to the company's research, it "matched" AlphaZero's playing strength after one million self-training steps.

Though some might find it interesting that MuZero's Elo charts showed an upward vertical trend, compared to AlphaZero's Elo line.

In Go and Atari, MuZero's Elo soared past AlphaZero's.

In Shogi, MuZero's Elo seemed to equal AlphaZero's.

In Chess, MuZero showed a very slight upward trend, just barely passing AlphaZero's Elo line at one million training steps.

https://xlnwel.github.io/blog/images/application/MuZero-Figure-2.png

They seem to have stopped it there, satisfied that MuZero matched AlphaZero's strength. But it seems, if MuZero had been allowed to continue, its chess Elo would have continued to rise at a slow, steady rate.

You can read more about it here:

https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules

https://www.nature.com/articles/s41586-020-03051-4.epdf?sharing_token=kTk-xTZpQOF8Ym8nTQK6EdRgN0jAjWel9jnR3ZoTv0PMSWGj38iNIyNOw_ooNp2BvzZ4nIcedo7GEXD7UmLqb0M_V_fop31mMY9VBBLNmGbm0K9jETKkZnJ9SgJ8Rwhp3ySvLuTcUr888puIYbngQ0fiMf45ZGDAQ7fUI66-u7Y%3D

For whatever reason, not knowing the rules of the game led to MuZero finding improvements over AlphaZero.

NikkiLikeChikki

Oct 10, 2021

#10

Google has WAY more important things to do than work on chess. They are currently using the AI to do things to unlock the secrets of protein folding and have made significant progress. This has the potential to help in the fight of many diseases that kill lots of people when proteins fold improperly in the body. Google didn’t work on MuZero because it cared about chess. They used chess as a test to see if there were improvements in its learning algorithm.

Stil1

Oct 10, 2021

#11

True. They've said as much, elsewhere. And it'll be fascinating to see where DeepMind goes from here.

Forums

Hot Topics

Unanswered

Most Recent

so I'm a duck now.

SixInchSamurai 5 min ago

What's your favorite chess opening?

SixInchSamurai 6 min ago

Should I use Lichess?

Jenium 8 min ago

If you find mate in 1, you’re a genius 🧠

Yukichess88 8 min ago

Any tips for improving???

Jenium 13 min ago

Post your brilliant moves here(it can be great move)

guillermoberta 19 min ago

Sceptical

WelshWren 19 min ago

Better opening

magipi 34 min ago

What is your Chess Personality?

Mohammed_elgzeri 45 min ago

How to Beat Your First Chess Bot

GM_YOGI_48 45 min ago

Forum Legend

Following

New Comments

Locked Topic

Pinned Topic