set up reinforcement learning environment or initialize data logger object

since r2022a

syntax

setup(env)

setup(env,name=value)

setup(lgr)

description

when you define a custom training loop for reinforcement learning, you can simulate an agent or policy against an environment using the runepisode function. use the setup function to configure the environment for running simulations using multiple calls to runepisode.

also use setup to initialize a filelogger or monitorlogger object before logging data within a custom training loop.

environment objects

setup(env) sets up the specified reinforcement learning environment for running multiple simulations using runepisode.

example

setup(env,name=value) specifies nondefault configuration options using one or more name-value pair arguments.

data logger objects

example

setup(lgr) sets up the specified data logger object. setup tasks may include setting up a visualization, or creating directories for logging to file.

examples

simulate environment and agent

create a reinforcement learning environment and extract its observation and action specifications.

env = rlpredefinedenv("cartpole-discrete");
obsinfo = getobservationinfo(env);
actinfo = getactioninfo(env);

to approximate the q-value function withing the critic, use a neural network. create a network as an array of layer objects.

net = [...
    featureinputlayer(obsinfo.dimension(1))
    fullyconnectedlayer(24)
    relulayer
    fullyconnectedlayer(24)
    relulayer
    fullyconnectedlayer(2)
    softmaxlayer];

convert the network to a dlnetwork object and display the number of learnable parameters (weights).

net = dlnetwork(net);
summary(net)

   initialized: true
   number of learnables: 770
   inputs:
      1   'input'   4 features

create a discrete categorical actor using the network.

actor = rldiscretecategoricalactor(net,obsinfo,actinfo);

check your actor with a random observation.

act = getaction(actor,{rand(obsinfo.dimension)})

act = 1x1 cell array
    {[-10]}

create a policy object from the actor.

policy = rlstochasticactorpolicy(actor);

create an experience buffer.

buffer = rlreplaymemory(obsinfo,actinfo);

set up the environment for running multiple simulations. for this example, configure the training to log any errors rather than send them to the command window.

setup(env,stoponerror="off")

simulate multiple episodes using the environment and policy. after each episode, append the experiences to the buffer. for this example, run 100 episodes.

for i = 1:100
    output = runepisode(env,policy,maxsteps=300);
    append(buffer,output.agentdata.experiences)
end

clean up the environment.

cleanup(env)

sample a mini-batch of experiences from the buffer. for this example, sample 10 experiences.

batch = sample(buffer,10);

you can then learn from the sampled experiences and update the policy and actor.

log data to disk in a custom training loop

this example shows how to log data to disk when training an agent using a custom training loop.

create a filelogger object using rldatalogger.

flgr = rldatalogger();

set up the logger object. this operation initializes the object performing setup tasks such as, for example, creating the directory to save the data files.

setup(flgr);

within a custom training loop, you can now store data to the logger object memory and write data to file.

for this example, store random numbers to the file logger object, grouping them in the variables context1 and context2. when you issue a write command, a mat-file corresponding to an iteration and containing both variables is saved with the name specified in flgr.loggingoptions.filenamerule, in the folder specified by flgr.loggingoptions.loggingdirectory.

for iter = 1:10
    % store three random numbers in memory 
    % as elements of the variable "context1"
    for ct = 1:3
        store(flgr, "context1", rand, iter);
    end
    % store a random number in memory 
    % as the variable "context2"
    store(flgr, "context2", rand, iter);
    % write data to file every 4 iterations
    if mod(iter,4)==0
        write(flgr);
    end
end

clean up the logger object. this operation performs clean up tasks like for example writing to file any data still in memory.

cleanup(flgr);

input arguments

`env` — reinforcement learning environment
`rlfunctionenv` object | `simulinkenvwithagent` object | `rlneuralnetworkenvironment` object | `rlmdpenv` object | ...

reinforcement learning environment, specified as one of the following objects.

rlfunctionenv — environment defined using custom functions.
simulinkenvwithagent — simulink environment created using rlsimulinkenv or createintegratedenv
rlmdpenv — markov decision process environment
rlneuralnetworkenvironment — environment with deep neural network transition models
predefined environment created using rlpredefinedenv
custom environment created from a template (rlcreateenvtemplate)

`lgr` — date logger object
`filelogger` object | `monitorlogger` object | ...

data logger object, specified as either a filelogger or a monitorlogger object.

name-value arguments

specify optional pairs of arguments as name1=value1,...,namen=valuen, where name is the argument name and value is the corresponding value. name-value arguments must appear after other arguments, but the order of the pairs does not matter.

example: stoponerror="on"

`stoponerror` — option to stop episode when error occurs
`"on"` (default) | `"off"`

option to stop an episode when an error occurs, specified as one of the following:

"on" — stop the episode when an error occurs and generate an error message in the matlab^® command window.
"off" — log errors in the simulationinfo output of runepisode.

`useparallel` — option for using parallel simulations
`false` (default) | `true`

option for using parallel simulations, specified as a logical value. using parallel computing allows the usage of multiple cores, processors, computer clusters, or cloud resources to speed up simulation.

when you set useparallel to true, the output of a subsequent call to runepisode is an rl.env.future object, which supports deferred evaluation of the simulation.

`setupfcn` — function to run on each worker before running an episode
`[]` (default) | function handle

function to run on the each worker before running an episode, specified as a handle to a function with no input arguments. use this function to perform any preprocessing required before running an episode.

`cleanupfcn` — function to run on each worker when cleaning up environment
`[]` (default) | function handle

function to run on each worker when cleaning up the environment, specified as a handle to a function with no input arguments. use this function to clean up the workspace or perform other processing after calling runepisode.

`transferbaseworkspacevariables` — option to send model and workspace variables to parallel workers
`"on"` (default) | `"off"`

option to send model and workspace variables to parallel workers, specified as "on" or "off". when the option is "on", the client sends variables used in models and defined in the base matlab workspace to the workers.

`attachedfiles` — additional files to attach to parallel pool
string | string array

additional files to attach to the parallel pool before running an episode, specified as a string or string array.

`workerrandomseeds` — work random seeds
`-1` (default) | vector

worker random seeds, specified as one of the following:

-1 — set the random seed of each worker to the worker id.
vector with length equal to the number of workers — specify the random seed for each worker.

version history

introduced in r2022a

set up reinforcement learning environment or initialize data logger object -凯发k8网页登录

syntax

description

environment objects

data logger objects

examples

simulate environment and agent

log data to disk in a custom training loop

input arguments

`env` — reinforcement learning environment
`rlfunctionenv` object | `simulinkenvwithagent` object | `rlneuralnetworkenvironment` object | `rlmdpenv` object | ...

`lgr` — date logger object
`filelogger` object | `monitorlogger` object | ...

name-value arguments

`stoponerror` — option to stop episode when error occurs
`"on"` (default) | `"off"`

`useparallel` — option for using parallel simulations
`false` (default) | `true`

`setupfcn` — function to run on each worker before running an episode
`[]` (default) | function handle

`cleanupfcn` — function to run on each worker when cleaning up environment
`[]` (default) | function handle

`transferbaseworkspacevariables` — option to send model and workspace variables to parallel workers
`"on"` (default) | `"off"`

`attachedfiles` — additional files to attach to parallel pool
string | string array

`workerrandomseeds` — work random seeds
`-1` (default) | vector

version history

see also

functions

objects

topics

set up reinforcement learning environment or initialize data logger object -凯发k8网页登录

syntax

description

environment objects

data logger objects

examples

simulate environment and agent

log data to disk in a custom training loop

input arguments

env — reinforcement learning environment rlfunctionenv object | simulinkenvwithagent object | rlneuralnetworkenvironment object | rlmdpenv object | ...

lgr — date logger object filelogger object | monitorlogger object | ...

name-value arguments

stoponerror — option to stop episode when error occurs "on" (default) | "off"

useparallel — option for using parallel simulations false (default) | true

setupfcn — function to run on each worker before running an episode [] (default) | function handle

cleanupfcn — function to run on each worker when cleaning up environment [] (default) | function handle

transferbaseworkspacevariables — option to send model and workspace variables to parallel workers "on" (default) | "off"

attachedfiles — additional files to attach to parallel pool string | string array

workerrandomseeds — work random seeds -1 (default) | vector

version history

see also

functions

objects

topics

wechat

`env` — reinforcement learning environment
`rlfunctionenv` object | `simulinkenvwithagent` object | `rlneuralnetworkenvironment` object | `rlmdpenv` object | ...

`lgr` — date logger object
`filelogger` object | `monitorlogger` object | ...

`stoponerror` — option to stop episode when error occurs
`"on"` (default) | `"off"`

`useparallel` — option for using parallel simulations
`false` (default) | `true`

`setupfcn` — function to run on each worker before running an episode
`[]` (default) | function handle

`cleanupfcn` — function to run on each worker when cleaning up environment
`[]` (default) | function handle

`transferbaseworkspacevariables` — option to send model and workspace variables to parallel workers
`"on"` (default) | `"off"`

`attachedfiles` — additional files to attach to parallel pool
string | string array

`workerrandomseeds` — work random seeds
`-1` (default) | vector