create custom grid world environments
a grid world is a two-dimensional, cell-based environment where the agent starts from one cell and moves toward the terminal cell while collecting as much reward as possible. grid world environments are useful for applying reinforcement learning algorithms to discover optimal paths and policies for agents on the grid to arrive at the terminal goal in the fewest moves.
reinforcement learning toolbox™ lets you create custom matlab® grid world environments for your own applications. to create a custom grid world environment:
create the grid world model.
configure the grid world model.
use the grid world model to create your own grid world environment.
grid world models
you can create your own grid world model using the creategridworld
function. specify the grid size when creating the gridworld
object.
the gridworld
object has the following properties.
property | read-only | description | ||||||
---|---|---|---|---|---|---|---|---|
gridsize | yes | dimensions of the grid world, displayed as an m-by-n array. here, m represents the number of grid rows and n is the number of grid columns. | ||||||
currentstate | no | name of the current state of the agent, specified as a string. you can
use this property to set the initial state of the agent. the agent always starts
from cell the agent
starts from the | ||||||
states | yes | a string vector containing the state names of the grid world. for
instance, for a 2-by-2 grid world model gw.states = ["[1,1]"; "[2,1]"; "[1,2]"; "[2,2]"]; | ||||||
actions | yes | a string vector containing the list of possible actions that the agent
can use. you can set the actions when you create the grid world model by using the
gw = creategridworld(m,n,moves) specify
| ||||||
t | no | state transition matrix, specified as a 3-d array.
for instance, consider a 5-by-5 deterministic grid world
object northstatetransition = gw.t(:,:,1) from the above figure, the value of
| ||||||
r | no | reward transition matrix, specified as a 3-d array. the reward transition matrix
set up | ||||||
obstaclestates | no |
the black cells are obstacle states, and you can specify them using the following syntax: gw.obstaclestates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"]; for a workflow example, see train reinforcement learning agent in basic grid world. | ||||||
terminalstates | no |
gw.terminalstates = "[5,5]"; for a workflow example, see train reinforcement learning agent in basic grid world. |
grid world environments
you must create a markov decision process (mdp) environment using
rlmdpenv
from the
grid world model from the previous step. mdp is a discrete-time stochastic control process.
it provides a mathematical framework for
modeling decision making in situations where outcomes are partly random and partly under the
control of the decision maker. the agent uses the grid world environment object
rlmdpenv
to interact with the grid world model object
gridworld
.
for more information, see rlmdpenv
and train reinforcement learning agent in basic grid world.