options to train reinforcement learning agents using existing data

since r2023a

description

use an rltrainingfromdataoptions object to specify options to train an off-policy agent from existing data. training options include the maximum number of epochs to train, criteria for stopping training and criteria for saving agents. to train the agent using the specified options, pass this object to trainfromdata.

for more information on training agents, see train reinforcement learning agents.

creation

syntax

tfdopts = rltrainingfromdataoptions

tfdopts = rltrainingoptions(name=value)

description

tfdopts = rltrainingfromdataoptions returns a default options set to train an off-policy agent offline, from existing data.

example

tfdopts = rltrainingoptions(name=value) creates the training option set tfdopts and sets its properties using one or more name-value arguments.

properties

`maxepochs` — maximum number of epochs to train the agent
`1000` (default) | positive integer

maximum number of epochs to train the agent, specified as a positive integer. each epoch has a fixed number of learning steps specified by numstepsperepoch. regardless of other criteria for termination, training terminates after maxepochs.

example: maxepochs=500

`numstepsperepoch` — number of steps to run per epoch
`500` (default) | positive integer

number of steps to run per epoch, specified as a positive integer.

example: numstepsperepoch=1000

`experiencebufferupdatefrequency` — buffer update period
`1` (default) | positive integer

buffer update period, specified as a positive integer. for example, if the value of this option is 1 (default), then the buffer updates every epoch, if it is 2 the buffer updates every other epoch, and so on. note that the experience buffer is not updated if it already contains all the available data.

example: experiencebufferupdatefrequency=2

`numexperiencesperexperiencebufferupdate` — number of experiences appended per buffer update
`[]` (default) | positive integer

number of experiences appended per buffer update, specified as a positive integer or empty matrix. if the value of this option is left empty (default) then, at training time, it is automatically set to half the length of the experience buffer used by the agent.

example: numexperiencesperexperiencebufferupdate=5e5

`qvalueobservations` — batch of observations used to compute q values
`[]` (default) | cell array

batch of observations used to compute q values, specified as an 1-by-n cell array, where n is the number of observation channels. each cell must contain a batch of observations, along the batch dimension, for the corresponding observation channel. for example, if you have two observation channels carrying a 3-by-1 vector and a scalar, a batch of 10 random observations is {rand(3,1,10),rand(1,1,10)}.

if the value of this option is left empty (default) then, at training time, it is automatically set to a cell array in which each element corresponding to an observation channel is an array of zeros having the same dimensions of the observation, without any batch dimension.

example: qvalueobservations={rand(3,1,10),rand(1,1,10)}

`scoreaveragingwindowlength` — window length for averaging q-values
`5` (default) | positive integer scalar

window length for averaging q-values, specified as a scalar. one termination and one saving options are expressed in terms of average q-values. for these options, the average is calculated over the last scoreaveragingwindowlength epochs.

example: scoreaveragingwindowlength=10

`stoptrainingcriteria` — training termination condition
`"none"` (default) | `"qvalue"` | ...

training termination condition, specified as one of the following strings:

"none" — stop training after the agent is trained for the number of epochs specified in maxepochs.
"qvalue" — stop training when the average q-value (computed using the current critic and the observations specified in qvalueobservations) over the last scoreaveragingwindowlength epochs equals or exceeds the value specified in the stoptrainingvalue option.

example: stoptrainingcriteria="qvalue"

`stoptrainingvalue` — critical value of training termination condition
`"none"` (default) | scalar

critical value of the training termination condition, specified as a scalar. training ends when the termination condition specified by the stoptrainingcriteria option equals or exceeds this value.

for instance, if stoptrainingcriteria is "qvalue" and stoptrainingvalue is 50, then training terminates when the moving average q-value (computed using the current critic and the observations specified in qvalueobservations) over the number of epochs specified in scoreaveragingwindowlength equals or exceeds 50.

example: stoptrainingvalue=50

`saveagentcriteria` — condition for saving agent during training
`"none"` (default) | `"epochfrequency"` | `"qvalue"` | ...

condition for saving the agent during training, specified as one of the following strings:

"none" — do not save any agents during training.
"epochfrequency" — save the agent when the number of epochs is an integer multiple of the value specified in the saveagentvalue option.
"qvalue" — save the agent when the when the average q-value (computed using the current critic and the observations specified in qvalueobservations) over the last scoreaveragingwindowlength epochs equals or exceeds the value specified in saveagentvalue.

set this option to store candidate agents that perform in term of q-value, or just to save agent at a fixed rate. for instance, if saveagentcriteria is "epochfrequency" and saveagentvalue is 5, then the agent is saved every five epochs.

example: saveagentcriteria="epochfrequency"

`saveagentvalue` — critical value of condition for saving agent
`"none"` (default) | scalar

critical value of the condition for saving the agent, specified as a scalar.

example: saveagentvalue=10

`saveagentdirectory` — folder name for saved agents
`"savedagents"` (default) | string | character vector

folder name for saved agents, specified as a string or character vector. the folder name can contain a full or relative path. when an epoch occurs in which the condition specified by the saveagentcriteria and saveagentvalue options are satisfied, the software saves the agents in a mat-file in this folder. if the folder does not exist, train creates it. when saveagentcriteria is "none", this option is ignored and train does not create a folder.

example: saveagentdirectory = pwd "\run1\agents"

`verbose` — option to display training progress at the command line
`false` or `0` (default) | `true` or `1`

option to display training progress at the command line, specified as a numerical or logical 0 (false) or 1 (true). set to true to write information from each training epoch to the matlab^® command line during training.

example: verbose=false

`plots` — option to display training progress with episode manager
`"training-progress"` (default) | `"none"`

option to display training progress with episode manager, specified as "training-progress" or "none". by default, calling trainfromdata opens the reinforcement learning episode manager, which graphically and numerically displays information about the training progress, such as the reward for each epoch, average reward, number of epochs, and total number of steps. to turn off this display, set this option to "none". for more information, see train.

example: plots="none"

object functions

trainfromdata train off-policy reinforcement learning agent using existing data

examples

configure options to train agent from data

create an options set to train a reinforcement learning agent offline, from an existing dataset.

set the maximum number of epochs to 2000 and the maximum number of steps per epoch to 1000. do not set any criteria to stop the training before 1000 epochs. also, display training progress on the command line instead of using the episode manager.

tfdopts = rltrainingfromdataoptions(...
    maxepochs=2000,...
    numstepsperepoch=1000,...
    verbose=true,...
    plots="none")

tfdopts = 
  rltrainingfromdataoptions with properties:
                                  maxepochs: 2000
                           numstepsperepoch: 1000
            experiencebufferupdatefrequency: 1
    numexperiencesperexperiencebufferupdate: []
                         qvalueobservations: []
                 scoreaveragingwindowlength: 5
                       stoptrainingcriteria: "none"
                          stoptrainingvalue: "none"
                          saveagentcriteria: "none"
                             saveagentvalue: "none"
                         saveagentdirectory: "savedagents"
                                    verbose: 1
                                      plots: "none"

alternatively, create a default options set and use dot notation to change some of the values.

trainopts = rltrainingfromdataoptions;
trainopts.maxepochs = 2000;
trainopts.numstepsperepoch = 1000;
trainopts.verbose = true;
trainopts.plots = "training-progress";
trainopts

trainopts = 
  rltrainingfromdataoptions with properties:
                                  maxepochs: 2000
                           numstepsperepoch: 1000
            experiencebufferupdatefrequency: 1
    numexperiencesperexperiencebufferupdate: []
                         qvalueobservations: []
                 scoreaveragingwindowlength: 5
                       stoptrainingcriteria: "none"
                          stoptrainingvalue: "none"
                          saveagentcriteria: "none"
                             saveagentvalue: "none"
                         saveagentdirectory: "savedagents"
                                    verbose: 1
                                      plots: "training-progress"

you can now use trainopts as an input argument to the trainfromdata command.

version history

introduced in r2023a

options to train reinforcement learning agents using existing data -凯发k8网页登录

description

creation

syntax

description

properties

`maxepochs` — maximum number of epochs to train the agent
`1000` (default) | positive integer

`numstepsperepoch` — number of steps to run per epoch
`500` (default) | positive integer

`experiencebufferupdatefrequency` — buffer update period
`1` (default) | positive integer

`numexperiencesperexperiencebufferupdate` — number of experiences appended per buffer update
`[]` (default) | positive integer

`qvalueobservations` — batch of observations used to compute q values
`[]` (default) | cell array

`scoreaveragingwindowlength` — window length for averaging q-values
`5` (default) | positive integer scalar

`stoptrainingcriteria` — training termination condition
`"none"` (default) | `"qvalue"` | ...

`stoptrainingvalue` — critical value of training termination condition
`"none"` (default) | scalar

`saveagentcriteria` — condition for saving agent during training
`"none"` (default) | `"epochfrequency"` | `"qvalue"` | ...

`saveagentvalue` — critical value of condition for saving agent
`"none"` (default) | scalar

`saveagentdirectory` — folder name for saved agents
`"savedagents"` (default) | string | character vector

`verbose` — option to display training progress at the command line
`false` or `0` (default) | `true` or `1`

`plots` — option to display training progress with episode manager
`"training-progress"` (default) | `"none"`

object functions

examples

configure options to train agent from data

version history

see also

functions

objects

topics

options to train reinforcement learning agents using existing data -凯发k8网页登录

description

creation

syntax

description

properties

maxepochs — maximum number of epochs to train the agent 1000 (default) | positive integer

numstepsperepoch — number of steps to run per epoch 500 (default) | positive integer

experiencebufferupdatefrequency — buffer update period 1 (default) | positive integer

numexperiencesperexperiencebufferupdate — number of experiences appended per buffer update [] (default) | positive integer

qvalueobservations — batch of observations used to compute q values [] (default) | cell array

scoreaveragingwindowlength — window length for averaging q-values 5 (default) | positive integer scalar

stoptrainingcriteria — training termination condition "none" (default) | "qvalue" | ...

stoptrainingvalue — critical value of training termination condition "none" (default) | scalar

saveagentcriteria — condition for saving agent during training "none" (default) | "epochfrequency" | "qvalue" | ...

saveagentvalue — critical value of condition for saving agent "none" (default) | scalar

saveagentdirectory — folder name for saved agents "savedagents" (default) | string | character vector

verbose — option to display training progress at the command line false or 0 (default) | true or 1

plots — option to display training progress with episode manager "training-progress" (default) | "none"

object functions

examples

configure options to train agent from data

version history

see also

functions

objects

topics

wechat

`maxepochs` — maximum number of epochs to train the agent
`1000` (default) | positive integer

`numstepsperepoch` — number of steps to run per epoch
`500` (default) | positive integer

`experiencebufferupdatefrequency` — buffer update period
`1` (default) | positive integer

`numexperiencesperexperiencebufferupdate` — number of experiences appended per buffer update
`[]` (default) | positive integer

`qvalueobservations` — batch of observations used to compute q values
`[]` (default) | cell array

`scoreaveragingwindowlength` — window length for averaging q-values
`5` (default) | positive integer scalar

`stoptrainingcriteria` — training termination condition
`"none"` (default) | `"qvalue"` | ...

`stoptrainingvalue` — critical value of training termination condition
`"none"` (default) | scalar

`saveagentcriteria` — condition for saving agent during training
`"none"` (default) | `"epochfrequency"` | `"qvalue"` | ...

`saveagentvalue` — critical value of condition for saving agent
`"none"` (default) | scalar

`saveagentdirectory` — folder name for saved agents
`"savedagents"` (default) | string | character vector

`verbose` — option to display training progress at the command line
`false` or `0` (default) | `true` or `1`

`plots` — option to display training progress with episode manager
`"training-progress"` (default) | `"none"`