(not recommended) options set for reinforcement learning agent representations (critics and actors)

since r2019a

rlrepresentationoptions is not recommended. use an rloptimizeroptions object within an agent options object instead. for more information, see rlrepresentationoptions is not recommended.

description

use an rlrepresentationoptions object to specify an options set for critics (, ) and actors (, ).

creation

syntax

repopts = rlrepresentationoptions

repopts = rlrepresentationoptions(name,value)

description

repopts = rlrepresentationoptions creates a default option set to use as a last argument when creating a reinforcement learning actor or critic. you can modify the object properties using dot notation.

repopts = rlrepresentationoptions(name,value) creates an options set with the specified properties using one or more name-value pair arguments.

properties

`learnrate` — learning rate for the representation
`0.01` (default) | positive scalar

learning rate for the representation, specified as a positive scalar. if the learning rate is too low, then training takes a long time. if the learning rate is too high, then training might reach a suboptimal result or diverge.

example: 'learnrate',0.025

`optimizer` — optimizer for representation
`"adam"` (default) | `"sgdm"` | `"rmsprop"`

optimizer for training the network of the representation, specified as one of the following values.

"adam" — use the adam optimizer. you can specify the decay rates of the gradient and squared gradient moving averages using the gradientdecayfactor and squaredgradientdecayfactor fields of the optimizerparameters option.
"sgdm" — use the stochastic gradient descent with momentum (sgdm) optimizer. you can specify the momentum value using the momentum field of the optimizerparameters option.
"rmsprop" — use the rmsprop optimizer. you can specify the decay rate of the squared gradient moving average using the squaredgradientdecayfactor fields of the optimizerparameters option.

for more information about these optimizers, see in the algorithms section of in deep learning toolbox™.

example: 'optimizer',"sgdm"

`optimizerparameters` — applicable parameters for optimizer
`optimizerparameters` object

applicable parameters for the optimizer, specified as an optimizerparameters object with the following parameters.

parameter	description
`momentum`	contribution of previous step, specified as a scalar from 0 to 1. a value of 0 means no contribution from the previous step. a value of 1 means maximal contribution. this parameter applies only when `optimizer` is `"sgdm"`. in that case, the default value is 0.9. this default value works well for most problems.
`epsilon`	denominator offset, specified as a positive scalar. the optimizer adds this offset to the denominator in the network parameter updates to avoid division by zero. this parameter applies only when `optimizer` is `"adam"` or `"rmsprop"`. in that case, the default value is 10^–8. this default value works well for most problems.
`gradientdecayfactor`	decay rate of gradient moving average, specified as a positive scalar from 0 to 1. this parameter applies only when `optimizer` is `"adam"`. in that case, the default value is 0.9. this default value works well for most problems.
`squaredgradientdecayfactor`	decay rate of squared gradient moving average, specified as a positive scalar from 0 to 1. this parameter applies only when `optimizer` is `"adam"` or `"rmsprop"`. in that case, the default value is 0.999. this default value works well for most problems.

when a particular property of optimizerparameters is not applicable to the optimizer type specified in the optimizer option, that property is set to "not applicable".

to change the default values, create an rlrepresentationoptions set and use dot notation to access and change the properties of optimizerparameters.

repopts = rlrepresentationoptions;
repopts.optimizerparameters.gradientdecayfactor = 0.95;

`gradientthreshold` — threshold value for gradient
`inf` (default) | positive scalar

threshold value for the representation gradient, specified as inf or a positive scalar. if the gradient exceeds this value, the gradient is clipped as specified by the gradientthresholdmethod option. clipping the gradient limits how much the network parameters change in a training iteration.

example: 'gradientthreshold',1

`gradientthresholdmethod` — gradient threshold method
`"l2norm"` (default) | `"global-l2norm"` | `"absolute-value"`

gradient threshold method used to clip gradient values that exceed the gradient threshold, specified as one of the following values.

"l2norm" — if the l₂ norm of the gradient of a learnable parameter is larger than gradientthreshold, then scale the gradient so that the l₂ norm equals gradientthreshold.
"global-l2norm" — if the global l₂ norm, l, is larger than gradientthreshold, then scale all gradients by a factor of gradientthreshold/l. the global l₂ norm considers all learnable parameters.
"absolute-value" — if the absolute value of an individual partial derivative in the gradient of a learnable parameter is larger than gradientthreshold, then scale the partial derivative to have magnitude equal to gradientthreshold and retain the sign of the partial derivative.

for more information, see in the algorithms section of in deep learning toolbox.

example: 'gradientthresholdmethod',"absolute-value"

`l2regularizationfactor` — factor for l₂ regularization
0.0001 (default) | nonnegative scalar

factor for l₂ regularization (weight decay), specified as a nonnegative scalar. for more information, see in the algorithms section of in deep learning toolbox.

to avoid overfitting when using a representation with many parameters, consider increasing the l2regularizationfactor option.

example: 'l2regularizationfactor',0.0005

`usedevice` — computation device for training
`"cpu"` (default) | `"gpu"`

computation device used to perform deep neural network operations such as gradient computation, parameter update and prediction during training. it is specified as either "cpu" or "gpu".

the "gpu" option requires both parallel computing toolbox™ software and a cuda^® enabled nvidia^® gpu. for more information on supported gpus see gpu computing requirements (parallel computing toolbox).

you can use gpudevice (parallel computing toolbox) to query or select a local gpu device to be used with matlab^®.

note

training or simulating an agent on a gpu involves device-specific numerical round off errors. these errors can produce different results compared to performing the same operations a cpu.

note that if you want to use parallel processing to speed up training, you do not need to set usedevice. instead, when training your agent, use an rltrainingoptions object in which the useparallel option is set to true. for more information about training using multicore processors and gpus for training, see train agents using parallel computing and gpus.

example: 'usedevice',"gpu"

object functions

	(not recommended) value function critic representation for reinforcement learning agents
	(not recommended) q-value function critic representation for reinforcement learning agents
	(not recommended) deterministic actor representation for reinforcement learning agents
	(not recommended) stochastic actor representation for reinforcement learning agents

examples

configure options for creating representation

create an options set for creating a critic or actor representation for a reinforcement learning agent. set the learning rate for the representation to 0.05, and set the gradient threshold to 1. you can set the options using name,value pairs when you create the options set. any options that you do not explicitly set have their default values.

repopts = rlrepresentationoptions(learnrate=5e-2,...
                                  gradientthreshold=1)

repopts = 
  rlrepresentationoptions with properties:
                  learnrate: 0.0500
          gradientthreshold: 1
    gradientthresholdmethod: "l2norm"
     l2regularizationfactor: 1.0000e-04
                  usedevice: "cpu"
                  optimizer: "adam"
        optimizerparameters: [1x1 rl.option.optimizerparameters]

alternatively, create a default options set and use dot notation to change some of the values.

repopts = rlrepresentationoptions;
repopts.learnrate = 5e-2;
repopts.gradientthreshold = 1

repopts = 
  rlrepresentationoptions with properties:
                  learnrate: 0.0500
          gradientthreshold: 1
    gradientthresholdmethod: "l2norm"
     l2regularizationfactor: 1.0000e-04
                  usedevice: "cpu"
                  optimizer: "adam"
        optimizerparameters: [1x1 rl.option.optimizerparameters]

if you want to change the properties of the optimizerparameters option, use dot notation to access them.

repopts.optimizerparameters.epsilon = 1e-7;
repopts.optimizerparameters

ans = 
  optimizerparameters with properties:
                      momentum: "not applicable"
                       epsilon: 1.0000e-07
           gradientdecayfactor: 0.9000
    squaredgradientdecayfactor: 0.9990

version history

introduced in r2019a

r2022a: `rlrepresentationoptions` is not recommended

rlrepresentationoptions objects are no longer recommended. to specify optimization options for actors and critics, use rloptimizeroptions objects instead.

specifically, you can create an agent options object and set its criticoptimizeroptions and actoroptimizeroptions properties to suitable rloptimizeroptions objects. then you pass the agent options object to the function that creates the agent. this workflow is shown in the following table.

rlrepresentationoptions: not recommended rloptimizeroptions: recommended

`rlrepresentationoptions`: not recommended	`rloptimizeroptions`: recommended
crtopts = rlrepresentationoptions(... 'gradientthreshold',1); critic = rlvaluerepresentation(... net,obsinfo,'observation',{'obs'},ctropts)	criticopts = rloptimizeroptions(... 'gradientthreshold',1); agentopts = rlacagentoptions(... 'criticoptimizeroptions',crtopts); agent = rlacagent(actor,critic,agentopts)

crtopts = rlrepresentationoptions(...
'gradientthreshold',1);
critic = rlvaluerepresentation(...
net,obsinfo,'observation',{'obs'},ctropts)

criticopts = rloptimizeroptions(...
'gradientthreshold',1);
agentopts = rlacagentoptions(...
    'criticoptimizeroptions',crtopts);
agent = rlacagent(actor,critic,agentopts)

alternatively, you can create the agent and then use dot notation to access the optimization options for the agent actor and critic, for example: agent.agentoptions.actoroptimizeroptions.gradientthreshold = 1;.

(not recommended) options set for reinforcement learning agent representations (critics and actors) -凯发k8网页登录

description

creation

syntax

description

properties

`learnrate` — learning rate for the representation
`0.01` (default) | positive scalar

`optimizer` — optimizer for representation
`"adam"` (default) | `"sgdm"` | `"rmsprop"`

`optimizerparameters` — applicable parameters for optimizer
`optimizerparameters` object

`gradientthreshold` — threshold value for gradient
`inf` (default) | positive scalar

`gradientthresholdmethod` — gradient threshold method
`"l2norm"` (default) | `"global-l2norm"` | `"absolute-value"`

`l2regularizationfactor` — factor for l₂ regularization
0.0001 (default) | nonnegative scalar

`usedevice` — computation device for training
`"cpu"` (default) | `"gpu"`

object functions

examples

configure options for creating representation

version history

r2022a: `rlrepresentationoptions` is not recommended

see also

topics

(not recommended) options set for reinforcement learning agent representations (critics and actors) -凯发k8网页登录

description

creation

syntax

description

properties

learnrate — learning rate for the representation 0.01 (default) | positive scalar

optimizer — optimizer for representation "adam" (default) | "sgdm" | "rmsprop"

optimizerparameters — applicable parameters for optimizer optimizerparameters object

gradientthreshold — threshold value for gradient inf (default) | positive scalar

gradientthresholdmethod — gradient threshold method "l2norm" (default) | "global-l2norm" | "absolute-value"

l2regularizationfactor — factor for l2 regularization 0.0001 (default) | nonnegative scalar

usedevice — computation device for training "cpu" (default) | "gpu"

object functions

examples

configure options for creating representation

version history

r2022a: rlrepresentationoptions is not recommended

see also

topics

wechat

`learnrate` — learning rate for the representation
`0.01` (default) | positive scalar

`optimizer` — optimizer for representation
`"adam"` (default) | `"sgdm"` | `"rmsprop"`

`optimizerparameters` — applicable parameters for optimizer
`optimizerparameters` object

`gradientthreshold` — threshold value for gradient
`inf` (default) | positive scalar

`gradientthresholdmethod` — gradient threshold method
`"l2norm"` (default) | `"global-l2norm"` | `"absolute-value"`

`l2regularizationfactor` — factor for l₂ regularization
0.0001 (default) | nonnegative scalar

`usedevice` — computation device for training
`"cpu"` (default) | `"gpu"`

r2022a: `rlrepresentationoptions` is not recommended