Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Sort:
SeniorPatzer
SmyslovFan wrote:

World class GMs, professionals who use Stockfish every day, are absolutely in awe of the depth and beauty of Alpha Zero's games. The computer didn't just destroy Stockfish, it did it in style, rewriting some chess theory in the process! 

 

Some of those games were spectacular!

 

SmyslovFan,

 

What do you think this portends for the near future of competitive OTB chess by humans, and the interest in competitive OTB chess by humans?  

 

Increase, same, or decrease?

HobbyPIayer
Lyudmil_Tsvetkov wrote:

I will stop arguing here, because it is meaningless.

In order to operate, each and every program should have its code base, do you understand that?

You want to tell me that, some Alpha just arrived from somewhere, installed itself on the TPU and started improving its play?

There is code guiding its actions, that is so obvious, code, written by humans.

Whether it fulfills its task on a single or multiple levels is fully irrelevant: it still does so following the instructions of the initial code base.

What do you think they are doing, when Alpha reaches an optimum and can not improve any more, they leave it to get things straight by self-learning?

Of course, they are changing the code base, trying to optimise it.

 

If it does not have instructions that winning is good, how could it evaluate then if a position is good or bad? Of course, it knows winning is good and that is WRITTEN in the primary code by a human.

You think it does not have instructions to learn where the pieces land? Of course it does. If it can not make distinction between different board squares, how can it then optinise its algorithms? So, it checks the squares where the pieces have landed during the game and, depending on the result, increases or decreases their values. This is still done according to the instruction that winning is good and that psqt should be increased in case of a win. That second instruction has also been written by a human.

 

So that, it is humans who wrote the primary code base and are constantly changing/optimising it, while the computer just follows those instructions. Even the indication that after each games colours should be reversed is written by a human. Is not that so obvious?

So, basically, Alpha just follows instructions, both during play and self-training.

Obviously the rules of the game are inputted, and the objective (winning). Yes, it starts with that. But beyond that . . . ? The neural network figures out the rest.

And yes, obviously there's a lot of computing involved to create Alpha Zero. But it's also been directed to self-learn and create its own knowledge—which is the key breakthrough.

The purpose of AlphaGoZero is really to test the proficiency of self-learning AI. The future goal of DeepMind is to apply this technology to find advances in medicine that humans haven't been able to find on our own.

Chess is just a rules-based game that provides a nice opportunity to test the AI's ability to self-learn within a closed system.

Here's an interesting article about it: https://medium.com/intuitionmachine/the-strange-loop-in-alphago-zeros-self-play-6e3274fcdd9f

Elroch
Lyudmil_Tsvetkov wrote:

I will stop arguing here, because it is meaningless.

It would indeed be a great idea if you were to go away and first of all learn something about the topic. Then your posts might even be correct as well as becoming meaningful.

In order to operate, each and every program should have its code base, do you understand that?

Most of the code of AlphaZero is common to all the two player finite deterministic games of perfect information it might learn to play. In each it is necessary to add code to represent a position in the game and generate a list of legal moves for a position. Some types of AI would require tuning of hyperparameters, but as the AlphaZero paper says "In AlphaZero
we reuse the same hyper-parameters for all games without game-specific tuning. The
sole exception is the noise that is added to the prior policy to ensure exploration; this is
scaled in proportion to the typical number of legal moves for that game type."

You want to tell me that, some Alpha just arrived from somewhere, installed itself on the TPU and started improving its play?

The algorithms that implement neural networks are general, with nothing specific to chess. The algorithms that implement reinforcement learning are general to all finite two player deterministic games of perfect information. These both had to be designed. The former are widely used, and the latter were developed using general techniques (with no relationship to chess) developed by Richard Sutton and others, specifically model-based Q-learning, I believe.

There is code guiding its actions, that is so obvious, code, written by humans.

No. Read the previous paragraph of mine.

Whether it fulfills its task on a single or multiple levels is fully irrelevant: it still does so following the instructions of the initial code base.

So, what you do is of no significance because you are merely following the instructions of your DNA?

What do you think they are doing, when Alpha reaches an optimum and can not improve any more, they leave it to get things straight by self-learning?

Of course, they are changing the code base, trying to optimise it.

Time and time again, you wildly guess. Try learning more about the subject instead.

No. That is not what happened. They ran the self-learning algorithms and it got really good at chess. 

If it does not have instructions that winning is good, how could it evaluate then if a position is good or bad? Of course, it knows winning is good and that is WRITTEN in the primary code by a human. You think it does not have instructions to learn where the pieces land?

So you are saying now that any player who has been taught the rules and the objective of the game is getting an unfair advantage?

As it happens, it would be possible to write an AI to learn the rules and objectives of chess from examples, but this is not as challenging a task as to produce the strongest chess player, so they didn't bother with it.

Of course it does. If it can not make distinction between different board squares, how can it then optinise its algorithms? So, it checks the squares where the pieces have landed during the game and, depending on the result, increases or decreases their values. This is still done according to the instruction that winning is good and that psqt should be increased in case of a win. That second instruction has also been written by a human.

No. Only the representation of the position, the legal moves and the different ways a game can terminate, with their scores.

You probably need a course on reinforcement learning. One gentle introduction is this 10 lecture course by David Silver of the DeepMind team. Very enjoyable and informative:

https://www.youtube.com/playlist?list=PLzuuYNsE1EZAXYR4FJ75jcJseBmo4KQ9-

 So that, it is humans who wrote the primary code base and are constantly changing/optimising it,

No. The computer went from pathetic beginner to the best chess player in the world by self-learning without the slightest interference from humans. They did tell it enough to be a pathetic beginner and how to learn.

while the computer just follows those instructions. Even the indication that after each games colours should be reversed is written by a human. Is not that so obvious?

So, basically, Alpha just follows instructions, both during play and self-training.

Just like you and your DNA.

 

SeniorPatzer

Elroch:  "The computer went from pathetic beginner to the best chess player in the world by self-learning without the slightest interference from humans."

 

BoggleMeBrains (from another thread):  "I think all the quibbling over what restrictions were placed on Stockfish is beside the point.  Forget Stockfish.  The real story about AlphaZero isn't that it's better than Stockfish, it's that it represents the point at which machines truly surpassed humans at chess.  Conventional chess engines need to have their entire evaluation function provided for them by humans.  They may play better than humans, but are really nothing more than human knowledge running on fast hardware.  AlphaZero surpassed human chess knowledge entirely by itself.  It designed its own evaluation function.  The achievement is already beyond impressive even if Stockfish running on a supercomputer with opening books and tablebases might have proved stronger.  People fixating on whether the games were fair are kind of missing the true marvel of technology we've witnessed here.  Something with no human-provided knowledge can out-Karpov Karpov."

 

What does this portend for the near future of competitive OTB chess by humans?

 

Deep Blue did not kill it.  But will AlphaZero?

Elroch

IMO, tablebases would increase the strength of AlphaZero, but you would need a large database of computer chess games to make an opening book that would have any chance of competing with what AlphaZero has learnt to play. The problem is that a hundred games by weaker players could be refuted by innovations in AlphaZero's on the fly analysis (which itself are influenced by its own experience in a large number of self-play games, many at a very high standard.

jminkler
Pawn_Checkmate wrote:

My ears perked up because of the timing of going from reading a thread about it to now hearing it live on ChessTV.   this is called baader meinhof phenomenon- a frequency illusion..  It's like when you think about buying a new car and all over sudden you see the same type of car all over the city. 

 

Alpha zero was running on a super computer, while SF was on a computer that's worse than mine.  If one SF was on super computer and the another SF on regular computer, it would be 100-0

It wasnt playing on a super computer lol. It was playing on what you can pick up at best buy looking at a measly 80k positions per second while SF was calculating 70+million .. Smh. People read nothing... 

 

The engines of today are bound by their human handlers and rely on brute force alone to find moves rather than breaking out of their constraints and actually learning chess. 

 

These are some of the dumbest comments i have seen on chess.com ever. 

Elroch

It used 4 of googles tensor processing units (TPUs) which might be equivalent to about 1000 Intel cores (say 14 of the latest 72-core Xeons) for the purpose of neural network computations.

It's fair to say this is a super computer, although not on the scale of the really big ones used for big science.

Lyudmil_Tsvetkov
SmyslovFan wrote:

World class GMs, professionals who use Stockfish every day, are absolutely in awe of the depth and beauty of Alpha Zero's games. The computer didn't just destroy Stockfish, it did it in style, rewriting some chess theory in the process! 

 

Some of those games were spectacular!

The games were spectacular, but Smyslov would not quite have thought this way.

Lyudmil_Tsvetkov
Elroch wrote:
Lyudmil_Tsvetkov wrote:

I will stop arguing here, because it is meaningless.

It would indeed be a great idea if you were to go away and first of all learn something about the topic. Then your posts might even be correct as well as becoming meaningful.

In order to operate, each and every program should have its code base, do you understand that?

Most of the code of AlphaZero is common to all the two player finite deterministic games of perfect information it might learn to play. In each it is necessary to add code to represent a position in the game and generate a list of legal moves for a position. Some types of AI would require tuning of hyperparameters, but as the AlphaZero paper says "In AlphaZero
we reuse the same hyper-parameters for all games without game-specific tuning. The
sole exception is the noise that is added to the prior policy to ensure exploration; this is
scaled in proportion to the typical number of legal moves for that game type."

You want to tell me that, some Alpha just arrived from somewhere, installed itself on the TPU and started improving its play?

The algorithms that implement neural networks are general, with nothing specific to chess. The algorithms that implement reinforcement learning are general to all finite two player deterministic games of perfect information. These both had to be designed. The former are widely used, and the latter were developed using general techniques (with no relationship to chess) developed by Richard Sutton and others, specifically model-based Q-learning, I believe.

There is code guiding its actions, that is so obvious, code, written by humans.

No. Read the previous paragraph of mine.

Whether it fulfills its task on a single or multiple levels is fully irrelevant: it still does so following the instructions of the initial code base.

So, what you do is of no significance because you are merely following the instructions of your DNA?

What do you think they are doing, when Alpha reaches an optimum and can not improve any more, they leave it to get things straight by self-learning?

Of course, they are changing the code base, trying to optimise it.

Time and time again, you wildly guess. Try learning more about the subject instead.

No. That is not what happened. They ran the self-learning algorithms and it got really good at chess. 

If it does not have instructions that winning is good, how could it evaluate then if a position is good or bad? Of course, it knows winning is good and that is WRITTEN in the primary code by a human. You think it does not have instructions to learn where the pieces land?

So you are saying now that any player who has been taught the rules and the objective of the game is getting an unfair advantage?

As it happens, it would be possible to write an AI to learn the rules and objectives of chess from examples, but this is not as challenging a task as to produce the strongest chess player, so they didn't bother with it.

Of course it does. If it can not make distinction between different board squares, how can it then optinise its algorithms? So, it checks the squares where the pieces have landed during the game and, depending on the result, increases or decreases their values. This is still done according to the instruction that winning is good and that psqt should be increased in case of a win. That second instruction has also been written by a human.

No. Only the representation of the position, the legal moves and the different ways a game can terminate, with their scores.

You probably need a course on reinforcement learning. One gentle introduction is this 10 lecture course by David Silver of the DeepMind team. Very enjoyable and informative:

https://www.youtube.com/playlist?list=PLzuuYNsE1EZAXYR4FJ75jcJseBmo4KQ9-

 So that, it is humans who wrote the primary code base and are constantly changing/optimising it,

No. The computer went from pathetic beginner to the best chess player in the world by self-learning without the slightest interference from humans. They did tell it enough to be a pathetic beginner and how to learn.

while the computer just follows those instructions. Even the indication that after each games colours should be reversed is written by a human. Is not that so obvious?

So, basically, Alpha just follows instructions, both during play and self-training.

Just like you and your DNA.

 

"...it would be possible to write an AI..."

 

Well, you said it yourself, SOMEONE has written/is writing the AI.

AI is a non-concept. Human beings can adapt, that is a big difference. Adapt, due to the good/bad surrounding conditions. There is no intelligence without pain and happiness. Computers don't have intelligence.

 

Lyudmil_Tsvetkov
jminkler wrote:
Pawn_Checkmate wrote:

My ears perked up because of the timing of going from reading a thread about it to now hearing it live on ChessTV.   this is called baader meinhof phenomenon- a frequency illusion..  It's like when you think about buying a new car and all over sudden you see the same type of car all over the city. 

 

Alpha zero was running on a super computer, while SF was on a computer that's worse than mine.  If one SF was on super computer and the another SF on regular computer, it would be 100-0

It wasnt playing on a super computer lol. It was playing on what you can pick up at best buy looking at a measly 80k positions per second while SF was calculating 70+million .. Smh. People read nothing... 

 

The engines of today are bound by their human handlers and rely on brute force alone to find moves rather than breaking out of their constraints and actually learning chess. 

 

These are some of the dumbest comments i have seen on chess.com ever. 

Those TPUs cost half a million bucks.

The number of positions calculated is completely irrelevant.

It is much slower, as it evaluates many more things on a single node, but that does not mean it does that in a great way, only that it does so. One can significantly decrease speed even by doing different types of unnecessary calculations. Most probably, it has been checking its Value[piece][from][to] array

throughout nodes.

Lyudmil_Tsvetkov
Elroch wrote:

It used 4 of googles tensor processing units (TPUs) which might be equivalent to about 1000 Intel cores (say 14 of the latest 72-core Xeons) for the purpose of neural network computations.

It's fair to say this is a super computer, although not on the scale of the really big ones used for big science.

That is rigth, this was a Supercomputer versus a PC.

So, the achievement in terms of AI was rather small, definitely, much lower than SF.

SmyslovFan
Lyudmil_Tsvetkov wrote:
...

AI is a non-concept. Human beings can adapt, that is a big difference. Adapt, due to the good/bad surrounding conditions. There is no intelligence without pain and happiness. Computers don't have intelligence.

 

This is the crux of your argument, a logical fallacy:

Artificial Intelligence is the ability to adapt.

Computers cannot have artificial intelligence, therefore they cannot adapt. 

AlphaZero has adaptive learning. 

Therefore, AlphaZero must be a fraud. 

 

Q.E.D.

Facts don't matter in such a construct.

Godeka
Elroch wrote:

It used 4 of googles tensor processing units (TPUs) which might be equivalent to about 1000 Intel cores (say 14 of the latest 72-core Xeons) for the purpose of neural network computations.

It's fair to say this is a super computer, although not on the scale of the really big ones used for big science.

 

It is far away from being a super computer in all definitions. It is a single machine, that by the way is very energy efficient.

 

NN computations work very well on TPUs, better than on GPUs and much better than on CPUs. For the type of computation it is about 15 to 30 times faster (sayed by Google about TPUv2), so remove the last zero from your 1000 to be more realistic. Because of recursion and branching they are useless for conventional chess engines which run much better on CPUs. And to have 64 cores is not an insignificant amount either.

 

It would be possible to optimize both sites, the Stockfish setup or specialize the generic approach of AlphaZero. But to show that Stockfish could be beaten was not the point of the work. It's about a completely self learning NN that can reach top level—something that was doubted before because of the very tactical nature of chess.

 

It's nice to see that human opening theory was reconfirmed by an algorithm without any human input. Despite of that look at the games and you will see something special happened there. (Or to be more precise: DeepMind surprised the Go world one or two years before, and now they are doing it again in the chess and Shogi world.)

 

It would nice to see some of AlphaZero's Shogi games and some chess games played during different learning phases. We had saw before that DeepMind released a lot of data to the Go world, so it is not unlikely that we will see a little bit more of AlphaZero's chess games in the future. But it will be on the chess programmers to create something you can call StockfishZero or something like that. In Go there is already a project called LeelaZero: http://zero.sjeng.org/

 

Seeing some numbers like 5.000.000 games, three days of computation (Go) is one thing—but you get a feeling of it how much computation is necessary to create games and train a good NN if you try do it yourself. And even 4 hours of computation on 5064 TPUs is an awful lot, that's for sure.

lose-Loser

Alphazero neither played in nor won a TCEC championship. Houdini overtook Komodo as championship with Stockfish coming third.

HobbyPIayer
lose-Loser wrote:

Alphazero neither played in nor won a TCEC championship.

AlphaZero isn't a chess engine. It's a self-learning neural network.

You can't have AlphaZero compete fairly in the TCEC because it keeps rewriting itself, learning and becoming stronger with every game.

Chess engines don't have this ability. They're just stuck playing at their maximum strength, with no ability to improve beyond their programming.

ChastityMoon

And we ain't seen  nothin' yet.   Come back in 100 years.  Well...aside from the fact most of us will be  well into our dirtnaps...probably nothing to comeback to but radioactive ruin as far as the eye could see if there there was any eye.   Human  eye that is.   One of  AlphaZero  great-great-////-great grandkid's  eyes maybe.   

admkoz
Elroch wrote:
admkoz wrote:
Elroch wrote:
admkoz wrote:

What I am curious about is whether it "figures out" things like "don't give up a free queen", or does it really just have to figure that out again every time such an option presents itself?  

 

From there its experience improves these networks and after a while it would learn that positions where there was a queen missing tended to not have such as good an expected result. Well, actually it would get a general idea that more material is better[...]

I have put this crudely, but basically a big neural network learns to encapsulate concepts that can be very sophisticated[...]

So you're saying it DOES figure out that "more material is better" meaning that it can evaluate positions it has never seen before on that basis.  

 

You and me can glance at a board, see that there are no immediate threats, see that Black is up a rook, and figure Black has it in the bag, even if an actual mate is 30+ moves away.  We'll be right 999,999 times out of a million.  Can AlphaZero do that?  

We would not be right that often.

But yes, based on my understanding of the technology, it's positional evaluation network would be so good that without any explicit analysis at all it would play quite good chess. I am not sure how good it would be in this mode, but I do know it needs to do analysis to play at better than 2900 Elo (as it achieved near this level using about 1/30 of a second per move and got better as the time increased).

So what percentage of the time DO you think being up a rook in an otherwise normal position, in a game between > 1500 players, is a win? happy.png  That is just a quibble. 

 

OK, so AZ would do pretty well even if it was not allowed to do any further analysis.  That implies that AZ can evaluate any position, and it learned to do this solely by playing (initially) random games. 

 

I guess it may be that this is the kind of question that can't be answered in a blog post, but what I am trying to figure out is the form of that evaluation method and how it gets built.

Elroch
Lyudmil_Tsvetkov wrote:
Elroch wrote:

It used 4 of googles tensor processing units (TPUs) which might be equivalent to about 1000 Intel cores (say 14 of the latest 72-core Xeons) for the purpose of neural network computations.

It's fair to say this is a super computer, although not on the scale of the really big ones used for big science.

That is rigth, this was a Supercomputer versus a PC.

So, the achievement in terms of AI was rather small, definitely, much lower than SF.

The last slight is ridiculous for more than one reason. One of them is that Stockfish is not an AI, it is strong because its designers are experts in the design of fast, strong chess engines. The DeepMind team include no experts on chess or engines, because they didn't design one. AlphaZero did.

The computational demands of AlphaZero are entirely because the networks are large and deep. These involve sizeable matrix operations at every step of training and of application in a game. The programmers gave these networks no chess knowledge: rather this is where AlphaZero stores the knowledge about chess that it derives from its experience.

Elroch
admkoz wrote:
Elroch wrote:
admkoz wrote:
Elroch wrote:
admkoz wrote:

What I am curious about is whether it "figures out" things like "don't give up a free queen", or does it really just have to figure that out again every time such an option presents itself?  

 

From there its experience improves these networks and after a while it would learn that positions where there was a queen missing tended to not have such as good an expected result. Well, actually it would get a general idea that more material is better[...]

I have put this crudely, but basically a big neural network learns to encapsulate concepts that can be very sophisticated[...]

So you're saying it DOES figure out that "more material is better" meaning that it can evaluate positions it has never seen before on that basis.  

 

You and me can glance at a board, see that there are no immediate threats, see that Black is up a rook, and figure Black has it in the bag, even if an actual mate is 30+ moves away.  We'll be right 999,999 times out of a million.  Can AlphaZero do that?  

We would not be right that often.

But yes, based on my understanding of the technology, it's positional evaluation network would be so good that without any explicit analysis at all it would play quite good chess. I am not sure how good it would be in this mode, but I do know it needs to do analysis to play at better than 2900 Elo (as it achieved near this level using about 1/30 of a second per move and got better as the time increased).

So what percentage of the time DO you think being up a rook in an otherwise normal position, in a game between > 1500 players, is a win?   That is just a quibble. 

 

OK, so AZ would do pretty well even if it was not allowed to do any further analysis.  That implies that AZ can evaluate any position, and it learned to do this solely by playing (initially) random games. 

 

I guess it may be that this is the kind of question that can't be answered in a blog post, but what I am trying to figure out is the form of that evaluation method and how it gets built.

The nature of the evaluation method is quite simple. It has some sort of representation of the position as an array of numbers which are the inputs to the neural network  - the neural network doesn't know what they mean, it has to work this out from they way the relate to the results of games and to their values on other moves - and a large deep neural network with thousands (not sure how many thousands) of nodes in many layers which take the representation of the board and output a number, the expected score from the position. [I hope I haven't missed some published detail].

How the evaluation method gets built comprises two parts (if my understanding is correct - I am supplementing what is published with general ideas about deep reinforcement learning). The obvious one is when a game ends: the exact value of the position is available, and that can be used to adjust the neural network to improve its evaluations of earlier positions in the direction of the right result. The second one is that when it evaluates a position, if this evaluation is a surprise compared to the evaluation of previous positions, the network is tweaked to make the evaluations of previous positions a bit more in agreement with the later evaluation.

The first form of feedback is basically making the evaluation compatible with the absolute value of clear positions. The second form of feedback is basically making the evaluation compatible with the legal continuations in a position: the reason is that the perfect evaluation of a position is the same as that of a later position reached by perfect play.

[Technical points are (1) that this adjustment takes the form of nudging every parameter in the direction that would achieve the desired change proportional to how strongly it does so. This is called "gradient descent" (2) it actually takes a minibatch of 4096 positions and combines the desired adjustments to make one change to the entire neural network. the reason for this is that it allows more parallel computation to speed up the process, otherwise it would be just as good to do one at a time].

[I should correct one careless thing I have said sometimes: there is actually only one neural network which takes a board as input - this provides both the expected score and an array of probabilities that each of the legal moves is best. This is better than two networks, because some parts of the network can be used for both purposes].

Lyudmil_Tsvetkov
Elroch wrote:
Lyudmil_Tsvetkov wrote:
Elroch wrote:

It used 4 of googles tensor processing units (TPUs) which might be equivalent to about 1000 Intel cores (say 14 of the latest 72-core Xeons) for the purpose of neural network computations.

It's fair to say this is a super computer, although not on the scale of the really big ones used for big science.

That is rigth, this was a Supercomputer versus a PC.

So, the achievement in terms of AI was rather small, definitely, much lower than SF.

The last slight is ridiculous for more than one reason. One of them is that Stockfish is not an AI, it is strong because its designers are experts in the design of fast, strong chess engines. The DeepMind team include no experts on chess or engines, because they didn't design one. AlphaZero did.

The computational demands of AlphaZero are entirely because the networks are large and deep. These involve sizeable matrix operations at every step of training and of application in a game. The programmers gave these networks no chess knowledge: rather this is where AlphaZero stores the knowledge about chess that it derives from its experience.

Here you are factually wrong.

The team includes at least 3 chess programmers. Matthew Lai, the author of Giraffe and Talkchess member, is one of them. It is maybe for a reason that Giraffe, following the very same approach as Alpha, is rated only around 2400 on single core.

It is the huge hardware that made the difference and not the approach.

Self-learning, self-learning, what do you mean self-learning and AI.

It is just tuning itself on multiple levels, while still following the primary human code, that is all.

And the primary human code included at least that winning is good and that, in case of a win, certain terms/psqt should be bonised, given larger values.

What is working throughout the iterations is the human code, including by how much should those values be increased. Alpha is just executing the code and returning its findings.