Reinforcement Learning Playground

Episode
Previous Episode Reward

Maze Environment

The environment has 54 states, corresponding to the cells of the grid, and 4 actions (up, down, left, right). The agent receives a reward of -0.01 if it tries to step outsize the grid or onto a wall. It receives a reward of +10 if it reaches the goal.

Introduction

A typical Reinforcement Learning problem consists of an Agent and an Environment. At each time step, the agent receives information from the environment about its current state () and uses that information to choose an action () based on a policy (). After taking the action, the agent receives a reward (). Learning can take place over multiple episodes, an episode ends when the agent achieves its goal. In the Maze Environment problem, the state corresponds to the agent’s location in the grid and the actions correspond to moving to any of the neighboring cells. The agent receives a reward of 10 when it reaches the goal and a penalty of 0.01 when it tries to make an illegal move like moving outside the grid or stepping on a wall.

Hyperparameters

Acknowledgements

This project was inspired by the TensorFlow Playground project and the Coursera Reinforcement Learning Specialization which is based on Richard Sutton’sReinforcement Learning: An Introduction textbook. Many of the UI elements on this page come from the TensorFlow Playground project. The agent implementations are based on homework assignments from the Coursera Reinforcement Learning Specialization.