main content

train reinforcement learning agent in basic grid world -凯发k8网页登录

this example shows how to solve a grid world environment using reinforcement learning by training q-learning and sarsa agents. for more information on these agents, see and .

this grid world environment has the following configuration and rules:

  1. the grid world is 5-by-5 and bounded by borders, with four possible actions (north = 1, south = 2, east = 3, west = 4).

  2. the agent begins from cell [2,1] (second row, first column).

  3. the agent receives a reward 10 if it reaches the terminal state at cell [5,5] (blue).

  4. the environment contains a special jump from cell [2,4] to cell [4,4] with a reward of 5.

  5. the agent is blocked by obstacles (black cells).

  6. all other actions result in –1 reward.

create grid world environment

create the basic grid world environment.

env = rlpredefinedenv("basicgridworld");

to specify that the initial state of the agent is always [2,1], create a reset function that returns the state number for the initial agent state. this function is called at the start of each training episode and simulation. states are numbered starting at position [1,1]. the state number increases as you move down the first column and then down each subsequent column. therefore, create an anonymous function handle that sets the initial state to 2.

env.resetfcn = @() 2;

fix the random generator seed for reproducibility.

rng(0)

create q-learning agent

to create a q-learning agent, first create a q table using the observation and action specifications from the grid world environment. set the learning rate of the optimizer to 0.01.

qtable = rltable(getobservationinfo(env), ...
    getactioninfo(env));

to approximate the q-value function within the agent, create a rlqvaluefunction approximator object, using the table and the environment information.

qfcnappx = rlqvaluefunction(qtable, ...
    getobservationinfo(env), ...
    getactioninfo(env));

next, create a q-learning agent using the q-value function.

qagent = rlqagent(qfcnappx);

configure agent options such as the epsilon-greedy exploration and the learning rate for the function approximator.

qagent.agentoptions.epsilongreedyexploration.epsilon = .04;
qagent.agentoptions.criticoptimizeroptions.learnrate = 0.01;

for more information on creating q-learning agents, see and .

train q-learning agent

to train the agent, first specify the training options. for this example, use the following options:

  • train for at most 200 episodes. specify that each episode lasts for most 50 time steps.

  • stop training when the agent receives an average cumulative reward greater than 10 over 30 consecutive episodes.

for more information, see rltrainingoptions.

trainopts = rltrainingoptions;
trainopts.maxstepsperepisode = 50;
trainopts.maxepisodes= 200;
trainopts.stoptrainingcriteria = "averagereward";
trainopts.stoptrainingvalue = 11;
trainopts.scoreaveragingwindowlength = 30;

train the q-learning agent using the train function. training can take several minutes to complete. to save time while running this example, load a pretrained agent by setting dotraining to false. to train the agent yourself, set dotraining to true.

dotraining = false;
if dotraining
    % train the agent.
    trainingstats = train(qagent,env,trainopts);
else
    % load the pretrained agent for the example.
    load("basicgwqagent.mat","qagent")
end

the episode manager window opens and displays the training progress.

validate q-learning results

to validate the training results, simulate the agent in the training environment.

before running the simulation, visualize the environment and configure the visualization to maintain a trace of the agent states.

plot(env)
env.model.viewer.showtrace = true;
env.model.viewer.cleartrace;

simulate the agent in the environment using the sim function.

sim(qagent,env)

the agent trace shows that the agent successfully finds the jump from cell [2,4] to cell [4,4].

create and train sarsa agent

to create a sarsa agent, use the same q value function and epsilon-greedy configuration as for the q-learning agent. for more information on creating sarsa agents, see and .

sarsaagent = rlsarsaagent(qfcnappx);
sarsaagent.agentoptions.epsilongreedyexploration.epsilon = .04;
sarsaagent.agentoptions.criticoptimizeroptions.learnrate = 0.01;

train the sarsa agent using the train function. training can take several minutes to complete. to save time while running this example, load a pretrained agent by setting dotraining to false. to train the agent yourself, set dotraining to true.

dotraining = false;
if dotraining
    % train the agent.
    trainingstats = train(sarsaagent,env,trainopts);
else
    % load the pretrained agent for the example.
    load("basicgwsarsaagent.mat","sarsaagent")
end

validate sarsa training

to validate the training results, simulate the agent in the training environment.

plot(env)
env.model.viewer.showtrace = true;
env.model.viewer.cleartrace;

simulate the agent in the environment.

sim(sarsaagent,env)

the sarsa agent finds the same grid world solution as the q-learning agent.

see also

functions

objects

related examples

more about

网站地图