train reinforcement learning agent in basic grid world -凯发k8网页登录
this example shows how to solve a grid world environment using reinforcement learning by training q-learning and sarsa agents. for more information on these agents, see and .
this grid world environment has the following configuration and rules:
the grid world is 5-by-5 and bounded by borders, with four possible actions (north = 1, south = 2, east = 3, west = 4).
the agent begins from cell [2,1] (second row, first column).
the agent receives a reward 10 if it reaches the terminal state at cell [5,5] (blue).
the environment contains a special jump from cell [2,4] to cell [4,4] with a reward of 5.
the agent is blocked by obstacles (black cells).
all other actions result in –1 reward.
create grid world environment
create the basic grid world environment.
env = rlpredefinedenv("basicgridworld");
to specify that the initial state of the agent is always [2,1], create a reset function that returns the state number for the initial agent state. this function is called at the start of each training episode and simulation. states are numbered starting at position [1,1]. the state number increases as you move down the first column and then down each subsequent column. therefore, create an anonymous function handle that sets the initial state to 2
.
env.resetfcn = @() 2;
fix the random generator seed for reproducibility.
rng(0)
create q-learning agent
to create a q-learning agent, first create a q table using the observation and action specifications from the grid world environment. set the learning rate of the optimizer to 0.01
.
qtable = rltable(getobservationinfo(env), ...
getactioninfo(env));
to approximate the q-value function within the agent, create a rlqvaluefunction
approximator object, using the table and the environment information.
qfcnappx = rlqvaluefunction(qtable, ... getobservationinfo(env), ... getactioninfo(env));
next, create a q-learning agent using the q-value function.
qagent = rlqagent(qfcnappx);
configure agent options such as the epsilon-greedy exploration and the learning rate for the function approximator.
qagent.agentoptions.epsilongreedyexploration.epsilon = .04; qagent.agentoptions.criticoptimizeroptions.learnrate = 0.01;
for more information on creating q-learning agents, see and .
train q-learning agent
to train the agent, first specify the training options. for this example, use the following options:
train for at most 200 episodes. specify that each episode lasts for most 50 time steps.
stop training when the agent receives an average cumulative reward greater than 10 over 30 consecutive episodes.
for more information, see rltrainingoptions
.
trainopts = rltrainingoptions;
trainopts.maxstepsperepisode = 50;
trainopts.maxepisodes= 200;
trainopts.stoptrainingcriteria = "averagereward";
trainopts.stoptrainingvalue = 11;
trainopts.scoreaveragingwindowlength = 30;
train the q-learning agent using the train
function. training can take several minutes to complete. to save time while running this example, load a pretrained agent by setting dotraining
to false
. to train the agent yourself, set dotraining
to true
.
dotraining = false; if dotraining % train the agent. trainingstats = train(qagent,env,trainopts); else % load the pretrained agent for the example. load("basicgwqagent.mat","qagent") end
the episode manager window opens and displays the training progress.
validate q-learning results
to validate the training results, simulate the agent in the training environment.
before running the simulation, visualize the environment and configure the visualization to maintain a trace of the agent states.
plot(env) env.model.viewer.showtrace = true; env.model.viewer.cleartrace;
simulate the agent in the environment using the sim
function.
sim(qagent,env)
the agent trace shows that the agent successfully finds the jump from cell [2,4] to cell [4,4].
create and train sarsa agent
to create a sarsa agent, use the same q value function and epsilon-greedy configuration as for the q-learning agent. for more information on creating sarsa agents, see and .
sarsaagent = rlsarsaagent(qfcnappx); sarsaagent.agentoptions.epsilongreedyexploration.epsilon = .04; sarsaagent.agentoptions.criticoptimizeroptions.learnrate = 0.01;
train the sarsa agent using the train
function. training can take several minutes to complete. to save time while running this example, load a pretrained agent by setting dotraining
to false
. to train the agent yourself, set dotraining
to true
.
dotraining = false; if dotraining % train the agent. trainingstats = train(sarsaagent,env,trainopts); else % load the pretrained agent for the example. load("basicgwsarsaagent.mat","sarsaagent") end
validate sarsa training
to validate the training results, simulate the agent in the training environment.
plot(env) env.model.viewer.showtrace = true; env.model.viewer.cleartrace;
simulate the agent in the environment.
sim(sarsaagent,env)
the sarsa agent finds the same grid world solution as the q-learning agent.
see also
functions
creategridworld
|sim
|train
objects
- | |
rlmdpenv
|rltrainingoptions
| |