Quantcast
Channel: User darkcanuck - Stack Overflow
Viewing all articles
Browse latest Browse all 5

Answer by darkcanuck for TD(λ) in Delphi/Pascal (Temporal Difference Learning)

$
0
0

If you're serious about making this work, then understanding TD-lambda would be very helpful. Sutton and Barto's book, "Reinforcement Learning" is available for free in HTML format and covers this algorithm in detail. Basically, what TD-lambda does is create a mapping between a game state and the expected reward at the game's end. As games are played, states that are more likely to lead to winning states tend to get higher expected reward values.

For a simple game like tic-tac-toe, you're better off starting with a tabular mapping (just track an expected reward value for every possible game state). Then once you've got that working, you can try using a NN for the mapping instead. But I would suggest trying a separate, simpler NN project first...


Viewing all articles
Browse latest Browse all 5

Latest Images

Trending Articles





Latest Images