create custom simulink environments
to create a custom simulink® environment, first create a simulink environment model that represents the world as seen from the agent. such a system is often referred to as plant or open loop system, while the whole (integrated) system that includes both agent and environment is often referred to as the closed loop system.
your environment model must have an input signal, the action, which influences (through
some discrete, continuous or mixed dynamics) its next internal state and its outputs, which
are the observation, the reward and the is-done signals. the is-done signal is a scalar that
indicates the termination of an episode, causing the simulation to stop when its value is
true
.
note
the reward signal at time t must be the one corresponding to the transition between the observation output at time t-1 and the observation output at time t. therefore, the environment output signal corresponds to the signal called next observation in the agent-environment illustration presented in reinforcement learning environments.
if your observation contains multiple channels, group the signals carried by the channels into a single observation bus. for more information about bus signals, see (simulink).
for critical considerations on defining reward and observation signals in custom environments, see define reward and observation signals in custom environments.
once you have created the simulink model that represents the environment, you must add the rl agent block to it. you can do so automatically or manually.
to automatically create a new closed-loop simulink model that contains an rl agent block and references your environment model from its environment block, use
createintegratedenv
.you can specify as input arguments the names of the action, observation, is-done, and reward ports in your environment model. if your action or observation space is finite, you can also specify its possible values (otherwise the signals are assumed to be continuous). this function returns an environment object as well as the block path of the agent and the environment observation and action specifications. for more information on model referencing, see (simulink).
to manually add the agent to your model, drag and drop the rl agent block from the reinforcement learning simulink library. connect the action, observation, reward and is-done signals to the appropriate output and input ports of the block.
unless you already have an agent object for this environment in the matlab® workspace, you must create specification objects for the action and observation signals using
rlnumericspec
(for continuous signals) orrlfinitesetspec
(for discrete signals). for bus signals, create specifications usingbus2rlspec
.once you connect the blocks, create an environment object using
rlsimulinkenv
, specifying the model filename, the block path to the rl agent within the model, and the specification objects for the observation and the action channels, respectively. if your agent block already references an agent object in the matlab workspace, you do not need to supply the specification objects as input arguments.for an example, see water tank reinforcement learning environment model.
both rlsimulinkenv
and
createintegratedenv
return a custom simulink environment as a simulinkenvwithagent
object. this environment object acts as an interface so that when you call sim
or train
, these
functions in turn call the (compiled) simulink model associated with the object to generate experiences for the agents. you can
use this object to train and simulate agents in the same way as with any other
environment.
you can also create a multiagent simulink environment. to do so, create a simulink model that has one action input and one set of outputs (observation, reward and
is-done) for every agent. then manually add an agent block for each agent. once you connect
the blocks, create an environment object using rlsimulinkenv
. unless
each agent block already references an agent object in the matlab workspace, you must supply to rlsimulinkenv
two
cell arrays containing the observation action specification objects, respectively, as input
arguments. for an example, see train multiple agents to perform collaborative task.
your environment can also include third-party functionality. for more information, see integrate with existing simulation or environment (simulink).
algebraic loops between environment and agent
to avoid (potentially unsolvable) algebraic loops, you must avoid any direct feedthrough (that is any direct dependency in the same time step) from the action to any of the output signals. this is because simulink treats the agent block as having a direct feedthrough from all its inputs (that is the action output at a given time step is considered to be directly dependent on the observation, reward and is-done inputs at the same time step).
additionally, for models created using createintegratedenv
the environment block is a referenced subsystem. therefore the environment block is also
normally treated as a direct feedthrough block unless the minimize
algebraic loop occurrences parameter is enabled.
in general, adding a (simulink) or (simulink) block to the action signal between the agent block and environment block removes the algebraic loop. alternatively you can add delay or memory blocks to all the environment output signals after the environment block. for more information on algebraic loops and how to remove some of them, see (simulink) and (simulink).