optimization options for actors and critics

since r2022a

description

use an rloptimizeroptions object to specify an optimization options set for actors and critics.

creation

syntax

optopts = rloptimizeroptions

optopts = rloptimizeroptions(name=value)

description

example

optopts = rloptimizeroptions creates a default optimizer option set to use as a criticoptimizeroptions or actoroptimizeroptions property of an agent option object, or as a last argument of rloptimizer to create an optimizer object. you can modify the object properties using dot notation.

example

optopts = rloptimizeroptions(name=value) creates an options set with the specified properties using one or more name-value arguments.

properties

`learnrate` — learning rate used in training the actor or critic function approximator
`0.01` (default) | positive scalar

learning rate used in training the actor or critic function approximator, specified as a positive scalar. if the learning rate is too low, then training takes a long time. if the learning rate is too high, then training might reach a suboptimal result or diverge.

example: learnrate=0.025

`gradientthreshold` — gradient threshold value for the training of the actor or critic function approximator
`inf` (default) | positive scalar

gradient threshold value used in training the actor or critic function approximator, specified as inf or a positive scalar. if the gradient exceeds this value, the gradient is clipped as specified by the gradientthresholdmethod option. clipping the gradient limits how much the network parameters can change in a training iteration.

example: gradientthreshold=1

`gradientthresholdmethod` — gradient threshold method used in training the actor or critic function approximator
`"l2norm"` (default) | `"global-l2norm"` | `"absolute-value"`

gradient threshold method used in training the actor or critic function approximator. this is the specific method used to clip gradient values that exceed the gradient threshold, and it is specified as one of the following values.

"l2norm" — if the l₂ norm of the vector g_lyr containing the gradient components related to the weights or biases of a layer is larger than gradientthreshold, then this option scales g_lyr by a factor of gradientthreshold/l, where l is the l₂ norm of g_lyr. when you use this option, the l₂ norm of g_lyr in the returned gradient cannot exceed gradientthreshold. for example, a fully connected layer has two parameter arrays, weights and bias. the threshold is applied to the l₂ norm of the gradient components related to weights and bias separately.
"global-l2norm" — if the l₂ norm of the gradient g (with respect to all learnable network parameters), is larger than gradientthreshold, then this option scales g by a factor of l, where l is the l₂ norm of g. when you use this option, the l₂ norm of the returned gradient cannot exceed gradientthreshold.
"absolute-value" — if the absolute value of an individual (scalar) partial derivative in the gradient g (with respect to all learnable network parameters), is larger than gradientthreshold, then this option scales the partial derivative so that the corresponding component in the returned gradient has magnitude equal to gradientthreshold and the same sign of the original partial derivative. when you use this option, the absolute value of any component of the returned gradient cannot exceed gradientthreshold.

for more information, see in the algorithms section of in deep learning toolbox™.

example: gradientthresholdmethod="absolute-value"

`l2regularizationfactor` — factor for l₂ regularization used in training the actor or critic function approximator
0.0001 (default) | nonnegative scalar

factor for l₂ regularization (weight decay) used in training the actor or critic function approximator, specified as a nonnegative scalar. for more information, see in the algorithms section of in deep learning toolbox.

to avoid overfitting when using a representation with many parameters, consider increasing the l2regularizationfactor option.

example: l2regularizationfactor=0.0005

`algorithm` — algorithm used for training actor or critic function approximator
`"adam"` (default) | `"sgdm"` | `"rmsprop"`

algorithm used for training the actor or critic function approximator, specified as one of the following values.

"adam" — use the adam (adaptive movement estimation) algorithm. you can specify the decay rates of the gradient and squared gradient moving averages using the gradientdecayfactor and squaredgradientdecayfactor fields of the optimizerparameters option.
"sgdm" — use the stochastic gradient descent with momentum (sgdm) algorithm. you can specify the momentum value using the momentum field of the optimizerparameters option.
"rmsprop" — use the rmsprop algorithm. you can specify the decay rate of the squared gradient moving average using the squaredgradientdecayfactor fields of the optimizerparameters option.

for more information about these algorithms, see the algorithms section of in deep learning toolbox.

example: optimizer="sgdm"

`optimizerparameters` — parameters for the training algorithm used for training the actor or critic function approximator
`optimizerparameters` object

parameters for the training algorithm used for training the actor or critic function approximator, specified as an optimizerparameters object with the following parameters.

parameter	description
`momentum`	contribution of previous step, specified as a scalar from 0 to 1. a value of 0 means no contribution from the previous step. a value of 1 means maximal contribution. this parameter applies only when `optimizer` is `"sgdm"`. in that case, the default value is 0.9. this default value works well for most problems.
`epsilon`	denominator offset, specified as a positive scalar. the optimizer adds this offset to the denominator in the network parameter updates to avoid division by zero. this parameter applies only when `optimizer` is `"adam"` or `"rmsprop"`. in that case, the default value is 10^–8. this default value works well for most problems.
`gradientdecayfactor`	decay rate of gradient moving average, specified as a positive scalar from 0 to 1. this parameter applies only when `optimizer` is `"adam"`. in that case, the default value is 0.9. this default value works well for most problems.
`squaredgradientdecayfactor`	decay rate of squared gradient moving average, specified as a positive scalar from 0 to 1. this parameter applies only when `optimizer` is `"adam"` or `"rmsprop"`. in that case, the default value is 0.999. this default value works well for most problems.

when a particular property of optimizerparameters is not applicable to the optimizer type specified in algorithm, that property is set to "not applicable".

to change property values, create an rloptimizeroptions object and use dot notation to access and change the properties of optimizerparameters.

repopts = rlrepresentationoptions;
repopts.optimizerparameters.gradientdecayfactor = 0.95;

object functions

	options for q-learning agent
	options for sarsa agent
	options for dqn agent
	options for pg agent
	options for ddpg agent
`rltd3agentoptions`	options for td3 agent
`rlacagentoptions`	options for ac agent
	options for ppo agent
	options for trpo agent
	options for sac agent
`rloptimizer`	creates an optimizer object for actors and critics

examples

create optimizer options object

use rloprimizeroptions to create a default optimizer option object to use for the training of a critic function approximator.

mycriticopts = rloptimizeroptions

mycriticopts = 
  rloptimizeroptions with properties:
                  learnrate: 0.0100
          gradientthreshold: inf
    gradientthresholdmethod: "l2norm"
     l2regularizationfactor: 1.0000e-04
                  algorithm: "adam"
        optimizerparameters: [1x1 rl.option.optimizerparameters]

using dot notation, change the training algorithm to stochastic gradient descent with momentum and set the value of the momentum parameter to 0.6.

mycriticopts.algorithm = "sgdm";
mycriticopts.optimizerparameters.momentum = 0.6;

create an ac agent option object, and set its criticoptimizeroptions property to mycriticopts.

myagentopt = rlacagentoptions;
myagentopt.criticoptimizeroptions = mycriticopts;

you can now use myagentopt as last input argument to rlacagent when creating your ac agent.

create optimizer options object specifying property values

use rloprimizeroptions to create an optimizer option object to use for the training of an actor function approximator. specify a learning rate of 0.2 and set the gradientthresholdmethod to "absolute-value".

myactoropts=rloptimizeroptions(learnrate=0.2, ...
    gradientthresholdmethod="absolute-value")

myactoropts = 
  rloptimizeroptions with properties:
                  learnrate: 0.2000
          gradientthreshold: inf
    gradientthresholdmethod: "absolute-value"
     l2regularizationfactor: 1.0000e-04
                  algorithm: "adam"
        optimizerparameters: [1x1 rl.option.optimizerparameters]

using dot notation, change the a gradientthreshold to 10.

myactoropts.gradientthreshold = 10;

create an ac agent option object and set its actoroptimizeroptions property to myactoropts.

myagentopt = rlacagentoptions( ...
    actoroptimizeroptions=myactoropts);

you can now use myagentopt as last input argument to rlacagent when creating your ac agent.

version history

introduced in r2022a

optimization options for actors and critics -凯发k8网页登录

description

creation

syntax

description

properties

`learnrate` — learning rate used in training the actor or critic function approximator
`0.01` (default) | positive scalar

`gradientthreshold` — gradient threshold value for the training of the actor or critic function approximator
`inf` (default) | positive scalar

`gradientthresholdmethod` — gradient threshold method used in training the actor or critic function approximator
`"l2norm"` (default) | `"global-l2norm"` | `"absolute-value"`

`l2regularizationfactor` — factor for l₂ regularization used in training the actor or critic function approximator
0.0001 (default) | nonnegative scalar

`algorithm` — algorithm used for training actor or critic function approximator
`"adam"` (default) | `"sgdm"` | `"rmsprop"`

`optimizerparameters` — parameters for the training algorithm used for training the actor or critic function approximator
`optimizerparameters` object

object functions

examples

create optimizer options object

create optimizer options object specifying property values

version history

see also

functions

topics

optimization options for actors and critics -凯发k8网页登录

description

creation

syntax

description

properties

learnrate — learning rate used in training the actor or critic function approximator 0.01 (default) | positive scalar

gradientthreshold — gradient threshold value for the training of the actor or critic function approximator inf (default) | positive scalar

gradientthresholdmethod — gradient threshold method used in training the actor or critic function approximator "l2norm" (default) | "global-l2norm" | "absolute-value"

l2regularizationfactor — factor for l2 regularization used in training the actor or critic function approximator 0.0001 (default) | nonnegative scalar

algorithm — algorithm used for training actor or critic function approximator "adam" (default) | "sgdm" | "rmsprop"

optimizerparameters — parameters for the training algorithm used for training the actor or critic function approximator optimizerparameters object

object functions

examples

create optimizer options object

create optimizer options object specifying property values

version history

see also

functions

topics

wechat

`learnrate` — learning rate used in training the actor or critic function approximator
`0.01` (default) | positive scalar

`gradientthreshold` — gradient threshold value for the training of the actor or critic function approximator
`inf` (default) | positive scalar

`gradientthresholdmethod` — gradient threshold method used in training the actor or critic function approximator
`"l2norm"` (default) | `"global-l2norm"` | `"absolute-value"`

`l2regularizationfactor` — factor for l₂ regularization used in training the actor or critic function approximator
0.0001 (default) | nonnegative scalar

`algorithm` — algorithm used for training actor or critic function approximator
`"adam"` (default) | `"sgdm"` | `"rmsprop"`

`optimizerparameters` — parameters for the training algorithm used for training the actor or critic function approximator
`optimizerparameters` object