optimization options for actors and critics -凯发k8网页登录
optimization options for actors and critics
since r2022a
description
use an rloptimizeroptions
object to specify an optimization
options set for actors and critics.
creation
description
creates a
default optimizer option set to use as a optopts
= rloptimizeroptionscriticoptimizeroptions
or
actoroptimizeroptions
property of an agent option object, or as a
last argument of rloptimizer
to create an optimizer object. you can
modify the object properties using dot notation.
creates an options set with the specified properties using one or more name-value
arguments.optopts
= rloptimizeroptions(name=value
)
properties
learnrate
— learning rate used in training the actor or critic function approximator
0.01
(default) | positive scalar
learning rate used in training the actor or critic function approximator, specified as a positive scalar. if the learning rate is too low, then training takes a long time. if the learning rate is too high, then training might reach a suboptimal result or diverge.
example: learnrate=0.025
gradientthreshold
— gradient threshold value for the training of the actor or critic function approximator
inf
(default) | positive scalar
gradient threshold value used in training the actor or critic function approximator,
specified as inf
or a positive scalar. if the gradient exceeds this
value, the gradient is clipped as specified by the
gradientthresholdmethod
option. clipping the gradient limits how
much the network parameters can change in a training iteration.
example: gradientthreshold=1
gradientthresholdmethod
— gradient threshold method used in training the actor or critic function approximator
"l2norm"
(default) | "global-l2norm"
| "absolute-value"
gradient threshold method used in training the actor or critic function approximator. this is the specific method used to clip gradient values that exceed the gradient threshold, and it is specified as one of the following values.
"l2norm"
— if the l2 norm of the vector glyr containing the gradient components related to the weights or biases of a layer is larger thangradientthreshold
, then this option scales glyr by a factor ofgradientthreshold/l
, where l is the l2 norm of glyr. when you use this option, the l2 norm of glyr in the returned gradient cannot exceedgradientthreshold
. for example, a fully connected layer has two parameter arrays,weights
andbias
. the threshold is applied to the l2 norm of the gradient components related toweights
andbias
separately."global-l2norm"
— if the l2 norm of the gradient g (with respect to all learnable network parameters), is larger thangradientthreshold
, then this option scales g by a factor of l, where l is the l2 norm of g. when you use this option, the l2 norm of the returned gradient cannot exceedgradientthreshold
."absolute-value"
— if the absolute value of an individual (scalar) partial derivative in the gradient g (with respect to all learnable network parameters), is larger thangradientthreshold
, then this option scales the partial derivative so that the corresponding component in the returned gradient has magnitude equal togradientthreshold
and the same sign of the original partial derivative. when you use this option, the absolute value of any component of the returned gradient cannot exceedgradientthreshold
.
for more information, see in the algorithms section of in deep learning toolbox™.
example: gradientthresholdmethod="absolute-value"
l2regularizationfactor
— factor for l2 regularization used in training the actor or critic function approximator
0.0001 (default) | nonnegative scalar
factor for l2 regularization (weight decay) used in training the actor or critic function approximator, specified as a nonnegative scalar. for more information, see in the algorithms section of in deep learning toolbox.
to avoid overfitting when using a representation with many parameters, consider
increasing the l2regularizationfactor
option.
example: l2regularizationfactor=0.0005
algorithm
— algorithm used for training actor or critic function approximator
"adam"
(default) | "sgdm"
| "rmsprop"
algorithm used for training the actor or critic function approximator, specified as one of the following values.
"adam"
— use the adam (adaptive movement estimation) algorithm. you can specify the decay rates of the gradient and squared gradient moving averages using thegradientdecayfactor
andsquaredgradientdecayfactor
fields of theoptimizerparameters
option."sgdm"
— use the stochastic gradient descent with momentum (sgdm) algorithm. you can specify the momentum value using themomentum
field of theoptimizerparameters
option."rmsprop"
— use the rmsprop algorithm. you can specify the decay rate of the squared gradient moving average using thesquaredgradientdecayfactor
fields of theoptimizerparameters
option.
for more information about these algorithms, see the algorithms section of in deep learning toolbox.
example: optimizer="sgdm"
optimizerparameters
— parameters for the training algorithm used for training the actor or critic function approximator
optimizerparameters
object
parameters for the training algorithm used for training the actor or critic function
approximator, specified as an optimizerparameters
object with the
following parameters.
parameter | description |
---|---|
momentum | contribution of previous step, specified as a scalar from 0 to 1. a value of 0 means no contribution from the previous step. a value of 1 means maximal contribution. this parameter applies only when
|
epsilon | denominator offset, specified as a positive scalar. the optimizer adds this offset to the denominator in the network parameter updates to avoid division by zero. this parameter applies only when
|
gradientdecayfactor | decay rate of gradient moving average, specified as a positive scalar from 0 to 1. this parameter applies only when
|
squaredgradientdecayfactor | decay rate of squared gradient moving average, specified as a positive scalar from 0 to 1. this parameter applies only when
|
when a particular property of optimizerparameters
is not
applicable to the optimizer type specified in algorithm
, that
property is set to "not applicable"
.
to change property values, create an rloptimizeroptions
object and
use dot notation to access and change the properties of
optimizerparameters
.
repopts = rlrepresentationoptions; repopts.optimizerparameters.gradientdecayfactor = 0.95;
object functions
options for q-learning agent | |
options for sarsa agent | |
options for dqn agent | |
options for pg agent | |
options for ddpg agent | |
rltd3agentoptions | options for td3 agent |
rlacagentoptions | options for ac agent |
options for ppo agent | |
options for trpo agent | |
options for sac agent | |
rloptimizer | creates an optimizer object for actors and critics |
examples
create optimizer options object
use rloprimizeroptions
to create a default optimizer option object to use for the training of a critic function approximator.
mycriticopts = rloptimizeroptions
mycriticopts = rloptimizeroptions with properties: learnrate: 0.0100 gradientthreshold: inf gradientthresholdmethod: "l2norm" l2regularizationfactor: 1.0000e-04 algorithm: "adam" optimizerparameters: [1x1 rl.option.optimizerparameters]
using dot notation, change the training algorithm to stochastic gradient descent with momentum and set the value of the momentum parameter to 0.6
.
mycriticopts.algorithm = "sgdm";
mycriticopts.optimizerparameters.momentum = 0.6;
create an ac agent option object, and set its criticoptimizeroptions
property to mycriticopts
.
myagentopt = rlacagentoptions; myagentopt.criticoptimizeroptions = mycriticopts;
you can now use myagentopt
as last input argument to rlacagent
when creating your ac agent.
create optimizer options object specifying property values
use rloprimizeroptions
to create an optimizer option object to use for the training of an actor function approximator. specify a learning rate of 0.2
and set the gradientthresholdmethod
to "absolute-value"
.
myactoropts=rloptimizeroptions(learnrate=0.2, ... gradientthresholdmethod="absolute-value")
myactoropts = rloptimizeroptions with properties: learnrate: 0.2000 gradientthreshold: inf gradientthresholdmethod: "absolute-value" l2regularizationfactor: 1.0000e-04 algorithm: "adam" optimizerparameters: [1x1 rl.option.optimizerparameters]
using dot notation, change the a gradientthreshold
to 10
.
myactoropts.gradientthreshold = 10;
create an ac agent option object and set its actoroptimizeroptions
property to myactoropts
.
myagentopt = rlacagentoptions( ...
actoroptimizeroptions=myactoropts);
you can now use myagentopt
as last input argument to rlacagent
when creating your ac agent.
version history
introduced in r2022a
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
select a web site
choose a web site to get translated content where available and see local events and offers. based on your location, we recommend that you select: .
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.
americas
- (español)
- (english)
- (english)
europe
- (english)
- (english)
- (deutsch)
- (español)
- (english)
- (français)
- (english)
- (italiano)
- (english)
- (english)
- (english)
- (deutsch)
- (english)
- (english)
- switzerland
- (english)
asia pacific
- (english)
- (english)
- (english)
- 中国
- (日本語)
- (한국어)