create markov decision process model -凯发k8网页登录
create markov decision process model
since r2019a
description
examples
create mdp model
create an mdp model with eight states and two possible actions.
mdp = createmdp(8,["up";"down"]);
specify the state transitions and their associated rewards.
% state 1 transition and reward mdp.t(1,2,1) = 1; mdp.r(1,2,1) = 3; mdp.t(1,3,2) = 1; mdp.r(1,3,2) = 1; % state 2 transition and reward mdp.t(2,4,1) = 1; mdp.r(2,4,1) = 2; mdp.t(2,5,2) = 1; mdp.r(2,5,2) = 1; % state 3 transition and reward mdp.t(3,5,1) = 1; mdp.r(3,5,1) = 2; mdp.t(3,6,2) = 1; mdp.r(3,6,2) = 4; % state 4 transition and reward mdp.t(4,7,1) = 1; mdp.r(4,7,1) = 3; mdp.t(4,8,2) = 1; mdp.r(4,8,2) = 2; % state 5 transition and reward mdp.t(5,7,1) = 1; mdp.r(5,7,1) = 1; mdp.t(5,8,2) = 1; mdp.r(5,8,2) = 9; % state 6 transition and reward mdp.t(6,7,1) = 1; mdp.r(6,7,1) = 5; mdp.t(6,8,2) = 1; mdp.r(6,8,2) = 1; % state 7 transition and reward mdp.t(7,7,1) = 1; mdp.r(7,7,1) = 0; mdp.t(7,7,2) = 1; mdp.r(7,7,2) = 0; % state 8 transition and reward mdp.t(8,8,1) = 1; mdp.r(8,8,1) = 0; mdp.t(8,8,2) = 1; mdp.r(8,8,2) = 0;
specify the terminal states of the model.
mdp.terminalstates = ["s7";"s8"];
input arguments
states
— model states
positive integer | string vector
model states, specified as one of the following:
positive integer — specify the number of model states. in this case, each state has a default name, such as
"s1"
for the first state.string vector — specify the state names. in this case, the total number of states is equal to the length of the vector.
actions
— model actions
positive integer | string vector
model actions, specified as one of the following:
positive integer — specify the number of model actions. in this case, each action has a default name, such as
"a1"
for the first action.string vector — specify the action names. in this case, the total number of actions is equal to the length of the vector.
output arguments
mdp
— mdp model
genericmdp
object
mdp model, returned as a genericmdp
object with the following
properties.
currentstate
— name of the current state
string
name of the current state, specified as a string.
states
— state names
string vector
state names, specified as a string vector with length equal to the number of states.
actions
— action names
string vector
action names, specified as a string vector with length equal to the number of actions.
t
— state transition matrix
3d array
state transition matrix, specified as a 3-d array, which determines the
possible movements of the agent in an environment. state transition matrix
t
is a probability matrix that indicates how likely the agent
will move from the current state s
to any possible next state
s'
by performing action a
.
t
is an
s-by-s-by-a array,
where s is the number of states and a is the
number of actions. it is given by:
the sum of the transition probabilities out from a nonterminal state
s
following a given action must sum up to one. therefore, all
stochastic transitions out of a given state must be specified at the same
time.
for example, to indicate that in state 1
following action
4
there is an equal probability of moving to states
2
or 3
, use the
following:
mdp.t(1,[2 3],4) = [0.5 0.5];
you can also specify that, following an action, there is some probability of remaining in the same state. for example:
mdp.t(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];
r
— reward transition matrix
3d array
reward transition matrix, specified as a 3-d array, which determines how much
reward the agent receives after performing an action in the environment.
r
has the same shape and size as state transition matrix
t
. the reward for moving from state s
to
state s'
by performing action a
is given by:
terminalstates
— terminal state names in the grid world
string vector
terminal state names in the grid world, specified as a string vector of state names.
version history
introduced in r2019a
see also
functions
objects
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
select a web site
choose a web site to get translated content where available and see local events and offers. based on your location, we recommend that you select: .
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.
americas
- (español)
- (english)
- (english)
europe
- (english)
- (english)
- (deutsch)
- (español)
- (english)
- (français)
- (english)
- (italiano)
- (english)
- (english)
- (english)
- (deutsch)
- (english)
- (english)
- switzerland
- (english)
asia pacific
- (english)
- (english)
- (english)
- 中国
- (日本語)
- (한국어)