main content

create a two-凯发k8网页登录

create a two-dimensional grid world for reinforcement learning

since r2019a

description

example

gw = creategridworld(m,n) creates a grid world gw of size m-by-n with default actions of ['n';'s';'e';'w'].

gw = creategridworld(m,n,moves) creates a grid world gw of size m-by-n with actions specified by moves.

examples

for this example, consider a 5-by-5 grid world with the following rules:

  1. a 5-by-5 grid world bounded by borders, with 4 possible actions (north = 1, south = 2, east = 3, west = 4).

  2. the agent begins from cell [2,1] (second row, first column).

  3. the agent receives reward 10 if it reaches the terminal state at cell [5,5] (blue).

  4. the environment contains a special jump from cell [2,4] to cell [4,4] with 5 reward.

  5. the agent is blocked by obstacles in cells [3,3], [3,4], [3,5] and [4,3] (black cells).

  6. all other actions result in -1 reward.

first, create a gridworld object using the creategridworld function.

gw = creategridworld(5,5)
gw = 
  gridworld with properties:
                gridsize: [5 5]
            currentstate: "[1,1]"
                  states: [25x1 string]
                 actions: [4x1 string]
                       t: [25x25x4 double]
                       r: [25x25x4 double]
          obstaclestates: [0x1 string]
          terminalstates: [0x1 string]
    probabilitytolerance: 8.8818e-16

now, set the initial, terminal and obstacle states.

gw.currentstate = '[2,1]';
gw.terminalstates = '[5,5]';
gw.obstaclestates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];

update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.

updatestatetranstionforobstacles(gw)
gw.t(state2idx(gw,"[2,4]"),:,:) = 0;
gw.t(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 1;

next, define the rewards in the reward transition matrix.

ns = numel(gw.states);
na = numel(gw.actions);
gw.r = -1*ones(ns,ns,na);
gw.r(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 5;
gw.r(:,state2idx(gw,gw.terminalstates),:) = 10;

now, use rlmdpenv to create a grid world environment using the gridworld object gw.

env = rlmdpenv(gw)
env = 
  rlmdpenv with properties:
       model: [1x1 rl.env.gridworld]
    resetfcn: []

you can visualize the grid world environment using the plot function.

plot(env)

input arguments

number of rows of the grid world, specified as a scalar.

number of columns of the grid world, specified as a scalar.

action names, specified as either 'standard' or 'kings'. when moves is set to

  • 'standard', the actions are ['n';'s';'e';'w'].

  • 'kings', the actions are ['n';'s';'e';'w';'ne';'nw';'se';'sw'].

output arguments

two-dimensional grid world, returned as a gridworld object with properties listed below. for more information, see create custom grid world environments.

size of the grid world, specified as a [m,n] vector.

name of the current state, specified as a string.

state names, specified as a string vector of length m*n.

action names, specified as a string vector. the length of the actions vector is determined by the moves argument.

actions is a string vector of length:

  • four, if moves is specified as 'standard'.

  • eight, moves is specified as 'kings'.

state transition matrix, specified as a 3-d array, which determines the possible movements of the agent in an environment. state transition matrix t is a probability matrix that indicates how likely the agent will move from the current state s to any possible next state s' by performing action a. t is given by,

t(s,s',a) = probability(s'|s,a).

t is:

  • a k-by-k-by-4 array, if moves is specified as 'standard'. here, k = m*n.

  • a k-by-k-by-8 array, if moves is specified as 'kings'.

reward transition matrix, specified as a 3-d array, determines how much reward the agent receives after performing an action in the environment. r has the same shape and size as state transition matrix t. reward transition matrix r is given by,

r = r(s,s',a).

r is:

  • a k-by-k-by-4 array, if moves is specified as 'standard'. here, k = m*n.

  • a k-by-k-by-8 array, if moves is specified as 'kings'.

state names that cannot be reached in the grid world, specified as a string vector.

terminal state names in the grid world, specified as a string vector.

version history

introduced in r2019a

网站地图