how does a machine learn to make moves

Sort:
Oldest
pdve

I'm trying to understand how a machine which learns selects moves. Basically, a machine has learned a hypothesis when it can map a given input(domain) to a given output(range). Here if input is location of house and output is price of house then a machine has learned to predict the price of the house if it has a hypothesis function h(theta) which maps the vector xi to the vector yi. the way it does so is by experience. The experience consists of what is known as the training data. the training data is the locations of the houses and the corresponding prices, the question and the answer which it is trained for. once it learns this training data it is ready to make predictions. now how does it learn. first of all we assume that the hypothesis function isof a certain form and contains some coefficients which are at present unknown. next we apply a penalty for error and minimize the cost function to obtain the coefficients so that the eventual error is minimized. the penalty is the square of the difference between the actual value(which is known) and the predicted value(according to our present hypothesis function). so for a hypothesis, this cost function will have a value. we have to minimize this value. for this we use various techniques which are not important suffice it to say that it comes from linear algebra and that we trace the steepest descent till we find the lowest point of the curve. now this maps from real numbers to real numbers but what if the situation is like chess where we are speaking of discrete quantities. a knight does not move 2.333 squares. here our range is discrete. for simplicity let's restrict ourselves to 0 and 1. either our move is 0 or it is 1. we will have to construct a cost function. this is in fact called logistic regression and the cost function involves logarithmic functions in order to map from real numbers(such as identifying cats or tigers from a relatively large domain i.e. the pixel values) to a function which is between 0 and 1. The closer it is to 1 we call its probability of being 1 that . the closer it is to zero we call its probability of being 0 that. so what alpha zero is probably doing is creating a vector of board states, and using gradient descent or a more sohpisticated/efficient algorithm and computing the probability of a move that will turn out to be the correct move in the given position by minimizing the cost function of the hypothesis function. the training data of alpha zero is various games which it tries to play against itself and observes the outcome of the game. the board position at any point is the input vector and the hypothesis function is the probability with which various moves must be played(all legal moves of various pieces). this hypothesis function has a cost function associated with it which must be minimized. this is what is done by training against the final outcome of the game so the machine is said to have in a sense learned the hypothesis function on its own, i.e. it knows with what probability to make various moves in various kinds of positions.

pdve

so let's say there is a particular hypothesis function which likes to give all material away. there is another hypothesis function which likes to walk the king early. needless to say these will end in losses and the coefficients will get driven down. the correct coefficients will come from the most harmonious play of the pieces.

pdve

Not only is it logic, it is actually intelligent logic. Why would you question whether it is logic? What would it be if not logic?

pdve

one day computers will be our teachers.

drmrboss

Which engine are you talking about?

There are massive types of engines 

1. Alpha beta search engines ( Stockfish, komodo, Houdini), most popular one

2. MCTS engines ( Komodo MCTS/scorpion mcts)

3. Neural network engines with self learning ( Leela Zero)

4. Brute Force engines (deep blue)

5. Neural netwrok engine with supervised learning ( Dexus X)

6. Hybrid engines (Leela/ stockfish hybrid)

 

For Cat 1, (stockfish)

Stockfish never learn, https://hxim.github.io/Stockfish-Evaluation-Guide/

Stockfish search millions of position and evaluate every position with this evaluation table plus value of of patch value (eg, 0.3 bonus for connected pawns, -0.3 penalty for backward pawns etc)

 

madratter7

The well known chess playing programs that learn (lc0, alpha zero, etc.) are neural networks. They don't work on the basis of hypothesis. Instead they learn on the basis of reward. When a neural network does a good job of selecting a move, it is rewarded by strengthening the connections in the network. When it does a bad job, the strength of the network connections is allowed to decay. Over time, this leads to a network good at the behavior you are training.

This is quite simplified, but is the gist of it.

WSama
madratter7 wrote:

The well known chess playing programs that learn (lc0, alpha zero, etc.) are neural networks. They don't work on the basis of hypothesis. Instead they learn on the basis of reward. When a neural network does a good job of selecting a move, it is rewarded by strengthening the connections in the network. When it does a bad job, the strength of the network connections is allowed to decay. Over time, this leads to a network good at the behavior you are training.

This is quite simplified, but is the gist of it.

 

What he/she said. A neural network, in this case, will learn by trial and error.  Theory can be inputed into the network through a selected training program designed especially for teaching neural networks.

Neural networks are extremely complex works of art, not necessarily the design, but the network formed. Teaching a complex game such as chess only makes things even more complicated.

To create an efficient neural network, you must hard code certain concepts into it that are specific to its learning field, just as the human brain consists of regions that are responsible solely for certain tasks. So chess neural networks are slightly different from others by design.

To create a neural network capable of excelling in multiple learning fields, one must integrate the specialising neural networks noted above, which is in itself no easy task.

 

pdve

well actually i don't know that much about neural networks. i am just getting into it. what madratter7 said about it is right. i just read something machine learning and thought of posting here.

blueemu
ghost_of_pushwood wrote:
Mrmerbs57 wrote:

 

I only read first 20 words God it's a long book this!!!!???

 

Yeah, that one droned on for days.

The proper Internet meme-speak for this sentiment is:

 tl;dr

... which means "too long; didn't read". 

drmrboss

If you would like to read it, this is how Alpha Zero works.

https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0

But if you would like to know how Leela works, you need to actively follow in Leela discord.

Leela in 2019 is a little bit stronger than A0 in 2017 now, due to additional features like

1. TB access

2. SE nets

3. and other features.

johnfenic
Beats me. Programming Robots are complicated machines, like humans. Good question.
pdve

actually i have enrolled in a machine learning course and an AI course from coursera that's why i was so excited about this subject.

pdve

coursera is an online training and certification company. i do not know who mr. ed is tongue.png

blueemu
pdve wrote:

i do not know who mr. ed is

Kids nowadays... Mr Ed is a Horse.

https://www.youtube.com/watch?v=X0XcH6d-1Ms

 

pdve

well maybe it is a cultural reference i did not get philistine that i am

dave_smith354
pdve wrote:

well maybe it is a cultural reference i did not get philistine that i am

Culture and age. It was broadcast in the early to mid 1960's and probably never made it to India (that is an Indian flag by your name, isn't it?)

pdve

haha .. yeah probably never made it to india. yes im from india.

jjupiter6

pdve wrote:

well maybe it is a cultural reference i did not get philistine that i am

You didn't miss anything. It was an American comedy from 50 years that has dated horribly. There is no reason that you would know it.

WSama

Interesting fact is that humans are subject to programming as well. We actually grew up being programmed at home, before we could even understand television or the words on the radio. 

What's that? Did you just ask your friend to do something? That's a friendly example of human programming. There are of course more intrusive methods, such as hypnosis...

WSama

This of course begs the question - what about those programs that run in the background and don't terminate. 

"Wake up Mr green"

Forums
Forum Legend
Following
New Comments
Locked Topic
Pinned Topic