samuals_notes.txt

Notes on "Some Studies in Machine Learning Using the Game of Checkers" - A. L. Samuel

Learns in 10 hrs of playing. Starts with
-rules of game
-sense of direction
-list of parameters

Uses two different machine learning procedures:
1. Neural Net Approach: randomly connected switching net that is rewarded/punished results in learned behavior.
2. Highly organized network designed to learn specific things. (more efficient)

Checkers has:
no known algorithm to guarentee win/draw
exploring every path in checkers - about 10^40 moves

Performance Evaluation:
- inability for one side or the other to move
- piece ratio - look ahead until one side has gained a piece advantage
- positional advantages - numerical measures of board positions added together (each with coefficient according to importance).

Other things to consider in performance:
- usually advantagous to trade pieces when ahead in order to avoid trades when behind.
- Kings weighted more than pieces - 3:2
- will trade three men for two kings or two kings for three men to gain a positional advantage.

Parameter adjustment options:
1. rote-learning
2. generalized: program itself selects and adjusts parameters and coefficients.


Search:
-looks ahead a minimum distance of 3
-will evaluate board if the next move is a jump, the last move was a jump, or an exchange offer is possible
- will continue looking ahead
- stops at 20 look aheads regardless of condition
(see "ply" section for details)
how to deal with 2 paths that lead to win but one is more direct:
- "carry effective ply along with score"
- chooses low ply if winning, and high-ply if losing

Rote learning:
- saves all board positions encountered during play and their scores
- references this memory
- tested using an arbitrarily picked scoring polynomial and playing against itself
- requires a lot of storage
- learned to imitate master play during opening moves
- poor during mid games
- avoided opening traps
- good for specialized cases

Generalized learning:
- good when there are large condition permutations.
- good when consequences of any actions come soon

Combining the two:
- save only limited amout of info during the early stages of learning
- increase that amount when evaluation coefficient is more stable
- or use generalization until more stable and then introduce rote learning.

Saving time:
- catalog saved boards grouped into records by # pieces, precense of piece advantage and which side, whether there are kings on board, side with advantage, diagonal axes. cataloged by board positions
- limit saved board positions by frequency of use and by ply
- use an age term that is carried with the score. set this to arbritrary value when first saved. each time its referenced, age=age/2. at memory merge times, each board position is automatically aged by two. a board is "forgotten" when it reaches some maximum.
- restrict max size of any one record. when limit reached, lowest ply board positions removed to bring size down
- delete redundancies
- remove board positions that are not of much value

polynomial coefficients:
- if correlation coefficient ratio is greater than integer n but less than n+1, set the ratio to 2^n.
- some functions for computing weights ... (page 219)
	Notes on "Some Studies in Machine Learning Using the Game of Checkers" - A. L. Samuel

	Learns in 10 hrs of playing. Starts with
	-rules of game
	-sense of direction
	-list of parameters

	Uses two different machine learning procedures:
	1. Neural Net Approach: randomly connected switching net that is rewarded/punished results in learned behavior.
	2. Highly organized network designed to learn specific things. (more efficient)

	Checkers has:
	no known algorithm to guarentee win/draw
	exploring every path in checkers - about 10^40 moves

	Performance Evaluation:
	- inability for one side or the other to move
	- piece ratio - look ahead until one side has gained a piece advantage
	- positional advantages - numerical measures of board positions added together (each with coefficient according to importance).

	Other things to consider in performance:
	- usually advantagous to trade pieces when ahead in order to avoid trades when behind.
	- Kings weighted more than pieces - 3:2
	- will trade three men for two kings or two kings for three men to gain a positional advantage.

	Parameter adjustment options:
	1. rote-learning
	2. generalized: program itself selects and adjusts parameters and coefficients.


	Search:
	-looks ahead a minimum distance of 3
	-will evaluate board if the next move is a jump, the last move was a jump, or an exchange offer is possible
	- will continue looking ahead
	- stops at 20 look aheads regardless of condition
	(see "ply" section for details)
	how to deal with 2 paths that lead to win but one is more direct:
	- "carry effective ply along with score"
	- chooses low ply if winning, and high-ply if losing

	Rote learning:
	- saves all board positions encountered during play and their scores
	- references this memory
	- tested using an arbitrarily picked scoring polynomial and playing against itself
	- requires a lot of storage
	- learned to imitate master play during opening moves
	- poor during mid games
	- avoided opening traps
	- good for specialized cases

	Generalized learning:
	- good when there are large condition permutations.
	- good when consequences of any actions come soon

	Combining the two:
	- save only limited amout of info during the early stages of learning
	- increase that amount when evaluation coefficient is more stable
	- or use generalization until more stable and then introduce rote learning.

	Saving time:
	- catalog saved boards grouped into records by # pieces, precense of piece advantage and which side, whether there are kings on board, side with advantage, diagonal axes. cataloged by board positions
	- limit saved board positions by frequency of use and by ply
	- use an age term that is carried with the score. set this to arbritrary value when first saved. each time its referenced, age=age/2. at memory merge times, each board position is automatically aged by two. a board is "forgotten" when it reaches some maximum.
	- restrict max size of any one record. when limit reached, lowest ply board positions removed to bring size down
	- delete redundancies
	- remove board positions that are not of much value

	polynomial coefficients:
	- if correlation coefficient ratio is greater than integer n but less than n+1, set the ratio to 2^n.
	- some functions for computing weights ... (page 219)