main content

optimization options for actors and critics -凯发k8网页登录

optimization options for actors and critics

since r2022a

description

use an rloptimizeroptions object to specify an optimization options set for actors and critics.

creation

description

example

optopts = rloptimizeroptions creates a default optimizer option set to use as a criticoptimizeroptions or actoroptimizeroptions property of an agent option object, or as a last argument of rloptimizer to create an optimizer object. you can modify the object properties using dot notation.

example

optopts = rloptimizeroptions(name=value) creates an options set with the specified properties using one or more name-value arguments.

properties

learning rate used in training the actor or critic function approximator, specified as a positive scalar. if the learning rate is too low, then training takes a long time. if the learning rate is too high, then training might reach a suboptimal result or diverge.

example: learnrate=0.025

gradient threshold value used in training the actor or critic function approximator, specified as inf or a positive scalar. if the gradient exceeds this value, the gradient is clipped as specified by the gradientthresholdmethod option. clipping the gradient limits how much the network parameters can change in a training iteration.

example: gradientthreshold=1

gradient threshold method used in training the actor or critic function approximator. this is the specific method used to clip gradient values that exceed the gradient threshold, and it is specified as one of the following values.

  • "l2norm" — if the l2 norm of the vector glyr containing the gradient components related to the weights or biases of a layer is larger than gradientthreshold, then this option scales glyr by a factor of gradientthreshold/l, where l is the l2 norm of glyr. when you use this option, the l2 norm of glyr in the returned gradient cannot exceed gradientthreshold. for example, a fully connected layer has two parameter arrays, weights and bias. the threshold is applied to the l2 norm of the gradient components related to weights and bias separately.

  • "global-l2norm" — if the l2 norm of the gradient g (with respect to all learnable network parameters), is larger than gradientthreshold, then this option scales g by a factor of l, where l is the l2 norm of g. when you use this option, the l2 norm of the returned gradient cannot exceed gradientthreshold.

  • "absolute-value" — if the absolute value of an individual (scalar) partial derivative in the gradient g (with respect to all learnable network parameters), is larger than gradientthreshold, then this option scales the partial derivative so that the corresponding component in the returned gradient has magnitude equal to gradientthreshold and the same sign of the original partial derivative. when you use this option, the absolute value of any component of the returned gradient cannot exceed gradientthreshold.

for more information, see in the algorithms section of in deep learning toolbox™.

example: gradientthresholdmethod="absolute-value"

factor for l2 regularization (weight decay) used in training the actor or critic function approximator, specified as a nonnegative scalar. for more information, see in the algorithms section of in deep learning toolbox.

to avoid overfitting when using a representation with many parameters, consider increasing the l2regularizationfactor option.

example: l2regularizationfactor=0.0005

algorithm used for training the actor or critic function approximator, specified as one of the following values.

  • "adam" — use the adam (adaptive movement estimation) algorithm. you can specify the decay rates of the gradient and squared gradient moving averages using the gradientdecayfactor and squaredgradientdecayfactor fields of the optimizerparameters option.

  • "sgdm" — use the stochastic gradient descent with momentum (sgdm) algorithm. you can specify the momentum value using the momentum field of the optimizerparameters option.

  • "rmsprop" — use the rmsprop algorithm. you can specify the decay rate of the squared gradient moving average using the squaredgradientdecayfactor fields of the optimizerparameters option.

for more information about these algorithms, see the algorithms section of in deep learning toolbox.

example: optimizer="sgdm"

parameters for the training algorithm used for training the actor or critic function approximator, specified as an optimizerparameters object with the following parameters.

parameterdescription
momentum

contribution of previous step, specified as a scalar from 0 to 1. a value of 0 means no contribution from the previous step. a value of 1 means maximal contribution.

this parameter applies only when optimizer is "sgdm". in that case, the default value is 0.9. this default value works well for most problems.

epsilon

denominator offset, specified as a positive scalar. the optimizer adds this offset to the denominator in the network parameter updates to avoid division by zero.

this parameter applies only when optimizer is "adam" or "rmsprop". in that case, the default value is 10–8. this default value works well for most problems.

gradientdecayfactor

decay rate of gradient moving average, specified as a positive scalar from 0 to 1.

this parameter applies only when optimizer is "adam". in that case, the default value is 0.9. this default value works well for most problems.

squaredgradientdecayfactor

decay rate of squared gradient moving average, specified as a positive scalar from 0 to 1.

this parameter applies only when optimizer is "adam" or "rmsprop". in that case, the default value is 0.999. this default value works well for most problems.

when a particular property of optimizerparameters is not applicable to the optimizer type specified in algorithm, that property is set to "not applicable".

to change property values, create an rloptimizeroptions object and use dot notation to access and change the properties of optimizerparameters.

repopts = rlrepresentationoptions;
repopts.optimizerparameters.gradientdecayfactor = 0.95;

object functions

options for q-learning agent
options for sarsa agent
options for dqn agent
options for pg agent
options for ddpg agent
rltd3agentoptionsoptions for td3 agent
rlacagentoptionsoptions for ac agent
options for ppo agent
options for trpo agent
options for sac agent
rloptimizercreates an optimizer object for actors and critics

examples

use rloprimizeroptions to create a default optimizer option object to use for the training of a critic function approximator.

mycriticopts = rloptimizeroptions
mycriticopts = 
  rloptimizeroptions with properties:
                  learnrate: 0.0100
          gradientthreshold: inf
    gradientthresholdmethod: "l2norm"
     l2regularizationfactor: 1.0000e-04
                  algorithm: "adam"
        optimizerparameters: [1x1 rl.option.optimizerparameters]

using dot notation, change the training algorithm to stochastic gradient descent with momentum and set the value of the momentum parameter to 0.6.

mycriticopts.algorithm = "sgdm";
mycriticopts.optimizerparameters.momentum = 0.6;

create an ac agent option object, and set its criticoptimizeroptions property to mycriticopts.

myagentopt = rlacagentoptions;
myagentopt.criticoptimizeroptions = mycriticopts;

you can now use myagentopt as last input argument to rlacagent when creating your ac agent.

use rloprimizeroptions to create an optimizer option object to use for the training of an actor function approximator. specify a learning rate of 0.2 and set the gradientthresholdmethod to "absolute-value".

myactoropts=rloptimizeroptions(learnrate=0.2, ...
    gradientthresholdmethod="absolute-value")
myactoropts = 
  rloptimizeroptions with properties:
                  learnrate: 0.2000
          gradientthreshold: inf
    gradientthresholdmethod: "absolute-value"
     l2regularizationfactor: 1.0000e-04
                  algorithm: "adam"
        optimizerparameters: [1x1 rl.option.optimizerparameters]

using dot notation, change the a gradientthreshold to 10.

myactoropts.gradientthreshold = 10;

create an ac agent option object and set its actoroptimizeroptions property to myactoropts.

myagentopt = rlacagentoptions( ...
    actoroptimizeroptions=myactoropts);

you can now use myagentopt as last input argument to rlacagent when creating your ac agent.

version history

introduced in r2022a

网站地图