main content

create markov decision process model -凯发k8网页登录

create markov decision process model

since r2019a

description

example

mdp = createmdp(states,actions) creates a markov decision process model with the specified states and actions.

examples

create an mdp model with eight states and two possible actions.

mdp = createmdp(8,["up";"down"]);

specify the state transitions and their associated rewards.

% state 1 transition and reward
mdp.t(1,2,1) = 1;
mdp.r(1,2,1) = 3;
mdp.t(1,3,2) = 1;
mdp.r(1,3,2) = 1;
% state 2 transition and reward
mdp.t(2,4,1) = 1;
mdp.r(2,4,1) = 2;
mdp.t(2,5,2) = 1;
mdp.r(2,5,2) = 1;
% state 3 transition and reward
mdp.t(3,5,1) = 1;
mdp.r(3,5,1) = 2;
mdp.t(3,6,2) = 1;
mdp.r(3,6,2) = 4;
% state 4 transition and reward
mdp.t(4,7,1) = 1;
mdp.r(4,7,1) = 3;
mdp.t(4,8,2) = 1;
mdp.r(4,8,2) = 2;
% state 5 transition and reward
mdp.t(5,7,1) = 1;
mdp.r(5,7,1) = 1;
mdp.t(5,8,2) = 1;
mdp.r(5,8,2) = 9;
% state 6 transition and reward
mdp.t(6,7,1) = 1;
mdp.r(6,7,1) = 5;
mdp.t(6,8,2) = 1;
mdp.r(6,8,2) = 1;
% state 7 transition and reward
mdp.t(7,7,1) = 1;
mdp.r(7,7,1) = 0;
mdp.t(7,7,2) = 1;
mdp.r(7,7,2) = 0;
% state 8 transition and reward
mdp.t(8,8,1) = 1;
mdp.r(8,8,1) = 0;
mdp.t(8,8,2) = 1;
mdp.r(8,8,2) = 0;

specify the terminal states of the model.

mdp.terminalstates = ["s7";"s8"];

input arguments

model states, specified as one of the following:

  • positive integer — specify the number of model states. in this case, each state has a default name, such as "s1" for the first state.

  • string vector — specify the state names. in this case, the total number of states is equal to the length of the vector.

model actions, specified as one of the following:

  • positive integer — specify the number of model actions. in this case, each action has a default name, such as "a1" for the first action.

  • string vector — specify the action names. in this case, the total number of actions is equal to the length of the vector.

output arguments

mdp model, returned as a genericmdp object with the following properties.

name of the current state, specified as a string.

state names, specified as a string vector with length equal to the number of states.

action names, specified as a string vector with length equal to the number of actions.

state transition matrix, specified as a 3-d array, which determines the possible movements of the agent in an environment. state transition matrix t is a probability matrix that indicates how likely the agent will move from the current state s to any possible next state s' by performing action a. t is an s-by-s-by-a array, where s is the number of states and a is the number of actions. it is given by:

t(s,s',a) = probability(s'|s,a).

the sum of the transition probabilities out from a nonterminal state s following a given action must sum up to one. therefore, all stochastic transitions out of a given state must be specified at the same time.

for example, to indicate that in state 1 following action 4 there is an equal probability of moving to states 2 or 3, use the following:

mdp.t(1,[2 3],4) = [0.5 0.5];

you can also specify that, following an action, there is some probability of remaining in the same state. for example:

mdp.t(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];

reward transition matrix, specified as a 3-d array, which determines how much reward the agent receives after performing an action in the environment. r has the same shape and size as state transition matrix t. the reward for moving from state s to state s' by performing action a is given by:

r = r(s,s',a).

terminal state names in the grid world, specified as a string vector of state names.

version history

introduced in r2019a

网站地图