create a two-凯发k8网页登录
create a two-dimensional grid world for reinforcement learning
since r2019a
description
examples
create grid world environment
for this example, consider a 5-by-5 grid world with the following rules:
a 5-by-5 grid world bounded by borders, with 4 possible actions (north = 1, south = 2, east = 3, west = 4).
the agent begins from cell [2,1] (second row, first column).
the agent receives reward 10 if it reaches the terminal state at cell [5,5] (blue).
the environment contains a special jump from cell [2,4] to cell [4,4] with 5 reward.
the agent is blocked by obstacles in cells [3,3], [3,4], [3,5] and [4,3] (black cells).
all other actions result in -1 reward.
first, create a gridworld
object using the creategridworld
function.
gw = creategridworld(5,5)
gw = gridworld with properties: gridsize: [5 5] currentstate: "[1,1]" states: [25x1 string] actions: [4x1 string] t: [25x25x4 double] r: [25x25x4 double] obstaclestates: [0x1 string] terminalstates: [0x1 string] probabilitytolerance: 8.8818e-16
now, set the initial, terminal and obstacle states.
gw.currentstate = '[2,1]'; gw.terminalstates = '[5,5]'; gw.obstaclestates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];
update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.
updatestatetranstionforobstacles(gw) gw.t(state2idx(gw,"[2,4]"),:,:) = 0; gw.t(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 1;
next, define the rewards in the reward transition matrix.
ns = numel(gw.states); na = numel(gw.actions); gw.r = -1*ones(ns,ns,na); gw.r(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 5; gw.r(:,state2idx(gw,gw.terminalstates),:) = 10;
now, use rlmdpenv
to create a grid world environment using the gridworld
object gw
.
env = rlmdpenv(gw)
env = rlmdpenv with properties: model: [1x1 rl.env.gridworld] resetfcn: []
you can visualize the grid world environment using the plot
function.
plot(env)
input arguments
m
— number of rows of the grid world
scalar
number of rows of the grid world, specified as a scalar.
n
— number of columns of the grid world
scalar
number of columns of the grid world, specified as a scalar.
moves
— action names
'standard'
(default) | 'kings'
action names, specified as either 'standard'
or
'kings'
. when moves
is set to
'standard'
, the actions are['n';'s';'e';'w']
.'kings'
, the actions are['n';'s';'e';'w';'ne';'nw';'se';'sw']
.
output arguments
gw
— two-dimensional grid world
gridworld
object
two-dimensional grid world, returned as a gridworld
object with
properties listed below. for more information, see create custom grid world environments.
gridsize
— size of the grid world
[m,n]
vector
size of the grid world, specified as a [m,n]
vector.
currentstate
— name of the current state
string
name of the current state, specified as a string.
actions
— action names
string vector
action names, specified as a string vector. the length of the
actions
vector is determined by the
moves
argument.
actions
is a string vector of length:
four, if
moves
is specified as'standard'
.eight,
moves
is specified as'kings'
.
t
— state transition matrix
3d array
state transition matrix, specified as a 3-d array, which determines the
possible movements of the agent in an environment. state transition matrix
t
is a probability matrix that indicates how likely the agent
will move from the current state s
to any possible next state
s'
by performing action a
.
t
is given by,
t
is:
a
k
-by-k
-by-4 array, ifmoves
is specified as'standard'
. here,k
=m
*n
.a
k
-by-k
-by-8 array, ifmoves
is specified as'kings'
.
r
— reward transition matrix
3d array
reward transition matrix, specified as a 3-d array, determines how much reward
the agent receives after performing an action in the environment.
r
has the same shape and size as state transition matrix
t
. reward transition matrix r
is given by,
r
is:
a
k
-by-k
-by-4 array, ifmoves
is specified as'standard'
. here,k
=m
*n
.a
k
-by-k
-by-8 array, ifmoves
is specified as'kings'
.
obstaclestates
— state names that cannot be reached in the grid world
string vector
state names that cannot be reached in the grid world, specified as a string vector.
terminalstates
— terminal state names in the grid world
string vector
terminal state names in the grid world, specified as a string vector.
version history
introduced in r2019a
see also
functions
objects
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
select a web site
choose a web site to get translated content where available and see local events and offers. based on your location, we recommend that you select: .
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.
americas
- (español)
- (english)
- (english)
europe
- (english)
- (english)
- (deutsch)
- (español)
- (english)
- (français)
- (english)
- (italiano)
- (english)
- (english)
- (english)
- (deutsch)
- (english)
- (english)
- switzerland
- (english)
asia pacific
- (english)
- (english)
- (english)
- 中国
- (日本語)
- (한국어)