Introduction
A typical Reinforcement Learning problem consists of an Agent and an Environment. At each time step, the agent receives information from the environment about its current state () and uses that information to choose an action () based on a policy (). After taking the action, the agent receives a reward (). Learning can take place over multiple episodes, an episode ends when the agent achieves its goal. In the Maze Environment problem, the state corresponds to the agent’s location in the grid and the actions correspond to moving to any of the neighboring cells. The agent receives a reward of 10 when it reaches the goal and a penalty of 0.01 when it tries to make an illegal move like moving outside the grid or stepping on a wall.
Hyperparameters
Acknowledgements
This project was inspired by the TensorFlow Playground project and the Coursera Reinforcement Learning Specialization which is based on Richard Sutton’sReinforcement Learning: An Introduction textbook. Many of the UI elements on this page come from the TensorFlow Playground project. The agent implementations are based on homework assignments from the Coursera Reinforcement Learning Specialization.