create markov decision process environment for reinforcement learning

since r2019a

description

a markov decision process (mdp) is a discrete time stochastic control process. it provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of the decision maker. mdps are useful for studying optimization problems solved using reinforcement learning. use rlmdpenv to create a markov decision process environment for reinforcement learning in matlab^®.

creation

syntax

env = rlmdpenv(mdp)

description

example

env = rlmdpenv(mdp) creates a reinforcement learning environment env with the specified mdp model.

input arguments

`mdp` — markov decision process model
`gridworld` object | `genericmdp` object

markov decision process model, specified as one of the following:

gridworld object created using creategridworld.
genericmdp object created using createmdp.

properties

`model` — markov decision process model
`gridworld` object | `genericmdp` object

markov decision process model, specified as a gridworld object or genericmdp object.

`resetfcn` — reset function
function handle

reset function, specified as a function handle.

object functions

`getactioninfo`	obtain action data specifications from reinforcement learning environment, agent, or experience buffer
`getobservationinfo`	obtain observation data specifications from reinforcement learning environment, agent, or experience buffer
`sim`	simulate trained reinforcement learning agents within specified environment
`train`	train reinforcement learning agents within a specified environment
`validateenvironment`	validate custom reinforcement learning environment

examples

create grid world environment

for this example, consider a 5-by-5 grid world with the following rules:

a 5-by-5 grid world bounded by borders, with 4 possible actions (north = 1, south = 2, east = 3, west = 4).
the agent begins from cell [2,1] (second row, first column).
the agent receives reward 10 if it reaches the terminal state at cell [5,5] (blue).
the environment contains a special jump from cell [2,4] to cell [4,4] with 5 reward.
the agent is blocked by obstacles in cells [3,3], [3,4], [3,5] and [4,3] (black cells).
all other actions result in -1 reward.

first, create a gridworld object using the creategridworld function.

gw = creategridworld(5,5)

gw = 
  gridworld with properties:
                gridsize: [5 5]
            currentstate: "[1,1]"
                  states: [25x1 string]
                 actions: [4x1 string]
                       t: [25x25x4 double]
                       r: [25x25x4 double]
          obstaclestates: [0x1 string]
          terminalstates: [0x1 string]
    probabilitytolerance: 8.8818e-16

now, set the initial, terminal and obstacle states.

gw.currentstate = '[2,1]';
gw.terminalstates = '[5,5]';
gw.obstaclestates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];

update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.

updatestatetranstionforobstacles(gw)
gw.t(state2idx(gw,"[2,4]"),:,:) = 0;
gw.t(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 1;

next, define the rewards in the reward transition matrix.

ns = numel(gw.states);
na = numel(gw.actions);
gw.r = -1*ones(ns,ns,na);
gw.r(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 5;
gw.r(:,state2idx(gw,gw.terminalstates),:) = 10;

now, use rlmdpenv to create a grid world environment using the gridworld object gw.

env = rlmdpenv(gw)

env = 
  rlmdpenv with properties:
       model: [1x1 rl.env.gridworld]
    resetfcn: []

you can visualize the grid world environment using the plot function.

plot(env)

version history

introduced in r2019a

create markov decision process environment for reinforcement learning -凯发k8网页登录

description

creation

syntax

description

input arguments

`mdp` — markov decision process model
`gridworld` object | `genericmdp` object

properties

`model` — markov decision process model
`gridworld` object | `genericmdp` object

`resetfcn` — reset function
function handle

object functions

examples

create grid world environment

version history

see also

functions

topics

create markov decision process environment for reinforcement learning -凯发k8网页登录

description

creation

syntax

description

input arguments

mdp — markov decision process model gridworld object | genericmdp object

properties

model — markov decision process model gridworld object | genericmdp object

resetfcn — reset function function handle

object functions

examples

create grid world environment

version history

see also

functions

topics

wechat

`mdp` — markov decision process model
`gridworld` object | `genericmdp` object

`model` — markov decision process model
`gridworld` object | `genericmdp` object

`resetfcn` — reset function
function handle