load predefined grid world environments
reinforcement learning toolbox™ software provides several predefined grid world environments for which the actions, observations, rewards, and dynamics are already defined. you can use these environments to:
learn reinforcement learning concepts.
gain familiarity with reinforcement learning toolbox software features.
test your own reinforcement learning agents.
you can load the following predefined matlab® grid world environments using the rlpredefinedenv
function.
environment | agent task |
---|---|
basic grid world | move from a starting location to a target location on a two-dimensional grid by
selecting moves from the discrete action space {n,s,e,w} . |
waterfall grid world | move from a starting location to a target location on a larger two-dimensional grid with unknown deterministic or stochastic dynamics. |
for more information on the properties of grid world environments, see create custom grid world environments.
you can also load predefined matlab control system environments. for more information, see load predefined control system environments.
basic grid world
the basic grid world environment is a two-dimensional 5-by-5 grid with a starting location, terminal location, and obstacles. the environment also contains a special jump from state [2,4] to state [4,4]. the goal of the agent is to move from the starting location to the terminal location while avoiding obstacles and maximizing the total reward.
to create a basic grid world environment, use the rlpredefinedenv
function. this function creates an rlmdpenv
object
representing the grid world.
env = rlpredefinedenv('basicgridworld');
you can visualize the grid world environment using the plot
function.
the agent location is a red circle. by default, the agent starts in state [1,1].
the terminal location is a blue square.
the obstacles are black squares.
plot(env)
actions
the agent can move in one of four possible directions (north, south, east, or west).
rewards
the agent receives the following rewards or penalties:
10
reward for reaching the terminal state at [5,5]5
reward for jumping from state [2,4] to state [4,4]-1
penalty for every other action
deterministic waterfall grid worlds
the deterministic waterfall grid world environment is a two-dimensional 8-by-7 grid with a starting location and terminal location. the environment includes a waterfall that pushes the agent toward the bottom of the grid. the goal of the agent is to move from the starting location to the terminal location while maximizing the total reward.
to create a deterministic waterfall grid world, use the rlpredefinedenv
function. this function creates an rlmdpenv
object
representing the grid world.
env = rlpredefinedenv('waterfallgridworld-deterministic');
as with the basic grid world, you can visualize the environment, where the agent is a red circle and the terminal location is a blue square.
plot(env)
actions
the agent can move in one of four possible directions (north, south, east, or west).
rewards
the agent receives the following rewards or penalties:
10
reward for reaching the terminal state at [4,5]-1
penalty for every other action
waterfall dynamics
in this environment, a waterfall pushes the agent toward the bottom of the grid.
the intensity of the waterfall varies between the columns, as shown at the top of the preceding figure. when the agent moves into a column with a nonzero intensity, the waterfall pushes it downward by the indicated number of squares. for example, if the agent goes east from state [5,2], it reaches state [7,3].
stochastic waterfall grid worlds
the stochastic waterfall grid world environment is a two-dimensional 8-by-7 grid with a starting location and terminal locations. the environment includes a waterfall that pushes the agent towards the bottom of the grid with a stochastic intensity. the goal of the agent is to move from the starting location to the target terminal location while avoiding the penalty terminal states along the bottom of the grid and maximizing the total reward.
to create a stochastic waterfall grid world, use the rlpredefinedenv
function. this function creates an rlmdpenv
object
representing the grid world.
env = rlpredefinedenv('waterfallgridworld-stochastic');
as with the basic grid world, you can visualize the environment, where the agent is a red circle and the terminal location is a blue square.
plot(env)
actions
the agent can move in one of four possible directions (north, south, east, or west).
rewards
the agent receives the following rewards or penalties:
10
reward for reaching the terminal state at [4,5]-10
penalty for reaching any terminal state in the bottom row of the grid-1
penalty for every other action
waterfall dynamics
in this environment, a waterfall pushes the agent towards the bottom of the grid with a stochastic intensity. the baseline intensity matches the intensity of the deterministic waterfall environment. however, in the stochastic waterfall case, the agent has an equal chance of experiencing the indicated intensity, one level above that intensity, or one level below that intensity. for example, if the agent goes east from state [5,2], it has an equal chance of reaching state [6,3], [7,3], or [8,3].