train ddpg, td3 or sac agent using an evolutionary strategy within a specified environment -凯发k8网页登录
train ddpg, td3 or sac agent using an evolutionary strategy within a specified environment
since r2023b
description
trains trainstats
= trainwithevolutionstrategy(env
,agent
,estopts
)agent
within the environment env
, using the
evolution strategy training options object trainopts
. note that
agent
is an handle object and it is updated during training, despite
being an input argument. for more information on the training algorithm, see train agent with evolution strategy.
examples
train agent using an evolutionary strategy
this example shows how to train a ddpg agent using an evolutionary strategy.
load the predefined environment object representing a cart-pole system with a continuous action space. for more information on this environment, see load predefined control system environments.
env = rlpredefinedenv("cartpole-continuous");
the agent networks are initialized randomly. ensure reproducibility by fixing the seed of the random generator.
rng(0)
create a ddpg agent with default networks.
agent = rlddpgagent(getobservationinfo(env),getactioninfo(env));
to create an evolution strategy options object, use rlevolutionstrategytrainingoptions
.
estopts = rlevolutionstrategytrainingoptions(... populationsize=10 , ... returnedpolicy="bestpolicy" , ... stoptrainingcriteria="episodecount" , ... stoptrainingvalue=100);
to train the agent, use trainwithevolutionstrategy
.
trainstats = trainwithevolutionstrategy(agent,env,estopts);
display the reward accumulated during the last episode.
trainstats.episodereward(end)
ans = 496.2431
this value means that the agent is able to balance the cart-pole system for the whole episode.
input arguments
agent
— ddpg, td3 or sac agent
rlddpgagent
object | rltd3agent
object | rlsacagent
object
agent to train, specified as an ,
rltd3agent
, or
object.
note
trainwithevolutionstrategy
updates the agent as training
progresses. for more information on how to preserve the original agent, how to save an
agent during training, and on the state of agent
after training, see
the notes and the tips section in train
. for
more information about handle objects, see .
for more information about how to create and configure agents for reinforcement learning, see reinforcement learning agents.
env
— environment
reinforcement learning environment object
environment in which the agent acts, specified as one of the following kinds of reinforcement learning environment object:
a predefined matlab® or simulink® environment created using
rlpredefinedenv
.a custom matlab environment you create with functions such as
rlfunctionenv
orrlcreateenvtemplate
.a custom simulink environment you create using
rlsimulinkenv
.
note
multiagent environments do not support training agents with an evolution strategy.
for more information about creating and configuring environments, see:
when env
is a simulink environment, calling trainwithevolutionstrategy
compiles and simulates the model associated with the environment.
estopts
— parameters and options for training using an evolution strategy
rlevolutionstrategytrainingoptions
object
parameters and options for training using an evolution strategy, specified as an
rlevolutionstrategytrainingoptions
object. use this argument to specify
parameters and options such as:
population size
population update method
number training epochs
criteria for saving candidate agents
how to display training progress
note
trainwithevolutionstrategy
does not support parallel
computing.
for details, see rlevolutionstrategytrainingoptions
.
output arguments
trainstats
— training episode data
rltrainingresult
object
training episode data, returned as an rltrainingresult
object. the
following properties pertain to the rltrainingresult
object:
episodeindex
— episode numbers
[1;2;…;n]
episode numbers, returned as the column vector [1;2;…;n]
,
where n
is the number of episodes in the training run. this
vector is useful if you want to plot the evolution of other quantities from
episode to episode.
episodereward
— reward for each episode
column vector
reward for each episode, returned in a column vector of length
n
. each entry contains the reward for the corresponding
episode.
episodesteps
— number of steps in each episode
column vector
number of steps in each episode, returned in a column vector of length
n
. each entry contains the number of steps in the
corresponding episode.
averagereward
— average reward over the averaging window
column vector
average reward over the averaging window specified in
trainopts
, returned as a column vector of length
n
. each entry contains the average award computed at the end
of the corresponding episode.
totalagentsteps
— total number of steps
column vector
total number of agent steps in training, returned as a column vector of length
n
. each entry contains the cumulative sum of the entries in
episodesteps
up to that point.
episodeq0
— critic estimate of expected discounted cumulative long-term reward at the beginning of each episode
column vector
critic estimate of expected discounted cumulative long-term reward using the
current agent and the environment initial conditions, returned as a column vector
of length n
. each entry is the critic estimate
(q0) for the agent at the beginning of
the corresponding episode. this field is present only for agents that have
critics, such as
and .
simulationinfo
— information collected during simulation
structure | vector of simulink.simulationoutput
objects
information collected during the simulations performed for training, returned as:
for training in matlab environments, a structure containing the field
simulationerror
. this field is a column vector with one entry per episode. when thestoponerror
option ofrltrainingoptions
is"off"
, each entry contains any errors that occurred during the corresponding episode. otherwise, the field contains an empty array.for training in simulink environments, a vector of
simulink.simulationoutput
objects containing simulation data recorded during the corresponding episode. recorded data for an episode includes any signals and states that the model is configured to log, simulation metadata, and any errors that occurred during the corresponding episode.
evaluationstatistic
— evaluation statistic for each episode
column vector
evaluation statistic for each episode, returned as a column vector with as
many elements as the number of episodes. since
trainwithevolutionstrategy
does not support evaluator
objects, each elements of this vector is a nan
. for more
information, see rlevaluator
and rlcustomevaluator
.
trainingoptions
— training options set
rlevolutionstrategytrainingoptions
object
training options set, returned as an rlevolutionstrategytrainingoptions
object.
version history
introduced in r2023b
see also
functions
objects
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
select a web site
choose a web site to get translated content where available and see local events and offers. based on your location, we recommend that you select: .
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.
americas
- (español)
- (english)
- (english)
europe
- (english)
- (english)
- (deutsch)
- (español)
- (english)
- (français)
- (english)
- (italiano)
- (english)
- (english)
- (english)
- (deutsch)
- (english)
- (english)
- switzerland
- (english)
asia pacific
- (english)
- (english)
- (english)
- 中国
- (日本語)
- (한국어)