main content

reinforcement learning agent -凯发k8网页登录

reinforcement learning agent

since r2019a

  • rl agent block

libraries:
reinforcement learning toolbox

description

use the rl agent block to simulate and train a reinforcement learning agent in simulink®. you associate the block with an agent stored in the matlab® workspace or a data dictionary, such as an rlacagent or object. you connect the block so that it receives an observation and a computed reward. for instance, consider the following block diagram of the rlsimplependulummodel model.

the observation input port of the rl agent block receives a signal that is derived from the instantaneous angle and angular velocity of the pendulum. the reward port receives a reward calculated from the same two values and the applied action. you configure the observations and reward computations that are appropriate to your system.

the block uses the agent to generate an action based on the observation and reward you provide. connect the action output port to the appropriate input for your system. for instance, in the rlsimplependulummodel, the action output port is a torque applied to the pendulum system. for more information about this model, see train dqn agent to swing up and balance pendulum.

to train a reinforcement learning agent in simulink, you generate an environment from the simulink model. you then create and configure the agent for training against that environment. for more information, see . when you call train using the environment, train simulates the model and updates the agent associated with the block.

ports

input

this port receives observation signals from the environment. observation signals represent measurements or other instantaneous system data. if you have multiple observations, you can use a mux block to combine them into a vector signal. to use a nonvirtual bus signal, use bus2rlspec.

this port receives the reward signal, which you compute based on the observation data. the reward signal is used during agent training to maximize the expectation of the long-term reward.

use this signal to specify conditions under which to terminate a training episode. you must configure logic appropriate to your system to determine the conditions for episode termination. one application is to terminate an episode that is clearly going well or going poorly. for instance, you can terminate an episode if the agent reaches its goal or goes irrecoverably far from its goal.

use this signal to provide an external action to the block. this signal can be a control action from a human expert, which can be used for safe or imitation learning applications. when the value of the use external action signal is 1, the passes the external action signal to the environment through the action block output. the block also uses the external action to update the agent policy based on the resulting observations and rewards.

dependencies

to enable this port, select the external action inputs parameter.

for some applications, the action applied to the environment can differ from the action output by the rl agent block. for example, the simulink model can contain a saturation block on the action output signal.

in such cases, to improve learning results, you can enable this input port and connect the actual action signal that is applied to the environment.

note

the last action port should be used only with off-policy agents, otherwise training can produce unexpected results.

dependencies

to enable this port, select the last action input parameter.

use this signal to pass the external action signal to the environment.

when the value of the use external action signal is 1 the block passes the external action signal to the environment. the block also uses the external action to update the agent policy.

when the value of the use external action signal is 0 the block does not pass the external action signal to the environment and does not update the policy using the external action. instead, the action from the block uses the action from the agent policy.

dependencies

to enable this port, select the external action inputs parameter.

output

action computed by the agent based on the observation and reward inputs. connect this port to the inputs of your system. to use a nonvirtual bus signal, use bus2rlspec.

note

continuous action-space agents such as rlacagent, , or (the ones using an object), do not enforce constraints set by the action specification. in these cases, you must enforce action space constraints within the environment.

cumulative sum of the reward signal during simulation. observe or log this signal to track how the cumulative reward evolves over time.

dependencies

to enable this port, select the cumulative reward output parameter.

parameters

enter the name of an agent object stored in the matlab workspace or a data dictionary, such as an rlacagent or object. for information about agent objects, see reinforcement learning agents.

if the rl agent block is within a conditionally executed subsystem, such as a (simulink) or a (simulink), you must specify the sample time of the agent object as -1 so that the block can inherit the sample time of its parent subsystem.

programmatic use

block parameter: agent
type: string, character vector
default: "agentobj"

generate a policy block that implements a greedy policy for the agent specified in agent object by calling the block function. to generate a greedy policy, the block sets the useexplorationpolicy property of the agent to false before generating the policy block..

the generated block is added to a new simulink model and the policy data is saved in a mat-file in the current working folder.

enable the external action and use external action block input ports by selecting this parameter.

programmatic use

block parameter: externalactionasinput
type: string, character vector
values: "off" | "on"
default: "off"

enable the last action block input port by selecting this parameter.

programmatic use

block parameter: providelastaction
type: string, character vector
values: "off" | "on"
default: "off"

enable the cumulative reward block output by selecting this parameter.

programmatic use

block parameter: providecumrwd
type: string, character vector
values: "off" | "on"
default: "off"

select this parameter to enforce the observation data types. in this case, if the data type of the signal connected to the observation input port does not match the data type in the observationinfo property of the agent, the block attempts to cast the signal to the correct data type. if casting the data type is not possible, the block generates an error.

enforcing strict data types:

  • lets you validate that the block is getting the correct data types.

  • allows other blocks to inherit their data type from the observation port.

programmatic use

block parameter: usestrictobservationdatatypes
type: string, character vector
values: "off" | "on"
default: "off"

version history

introduced in r2019a

网站地图