main content

create custom multiagent reinforcement learning environment -凯发k8网页登录

create custom multiagent reinforcement learning environment

since r2023b

description

use rlmultiagentfunctionenv to create a custom multiagent reinforcement learning environment in which all agents execute in the same step. to create your custom environment, you supply the observation and action specifications as well as your own reset and step matlab® functions. to verify the operation of your environment, rlmultiagentfunctionenv automatically calls validateenvironment after creating the environment.

creation

description

example

env = rlmultiagentfunctionenv(observationinfo,actioninfo,stepfcn,resetfcn) creates a multiagent environment in which all agents execute in the same step. the arguments are the observation and action specifications and the custom step and reset functions. the cell arrays observationinfo and actioninfo contain the observation and action specifications, respectively, for each agent. the stepfcn and resetfcn arguments are the names of your step and reset matlab functions, respectively, and they are used to set the stepfcn and resetfcn properties of env.

input arguments

observation specifications, specified as a cell array with as many elements as the number of agents. every element of the cell must contain the observation specifications for a corresponding agent. the observation specification for an agent must be an rlfinitesetspec or rlnumericspec object or a vector containing a mix of such objects (in which case every element of the vector defines the properties of a specific observation channel for the agent).

action specifications, specified as a cell array with as many elements as the number of agents. every element of the cell must contain the observation specifications for a corresponding agent. the action specification for an agent must be an rlfinitesetspec (for discrete action spaces) or rlnumericspec (for continuous action spaces) object. this object defines the properties of the action channel for the agent.

note

only one action channel is allowed.

properties

environment step function, specified as a function name, function handle, or handle to an anonymous function. the sim and train functions call stepfcn to update the environment at every simulation or training step.

this function must have two inputs and four outputs, as illustrated by the following signature.

[nextobservation,reward,isdone,updatedinfo] = mystepfunction(action,info)

for a given action input, the step function returns the values of the next observation and reward, a logical value indicating whether the episode is terminated, and an updated environment information variable.

specifically, the required input and output arguments are:

  • action — cell array containing the current actions from the agents. this must contain as many elements as the number of agents, matching the order specified in actioninfo. each element must match the dimensions and data type specified in the corresponding element of the actioninfo cell.

  • info and updatedinfo — any data that you want to pass from one step to the next. this can be the environment state or a structure containing state and parameters. the simulation and training functions (train and sim) handle this variable by:

    1. initializing info using the second output argument returned by resetfcn, at the beginning of the episode

    2. passing info as second input argument to stepfcn at each training or simulation step

    3. updating info using the fourth output argument returned by stepfcn, updatedinfo

  • nextobservation — cell array containing the next observations for all the agents. these are the observations related to the next state (the transition to the next state is caused by the current actions contained in action). therefore, nextobservation must contain as many elements as the number of agents and each element must match the dimensions and data types specified in the corresponding element of the observationinfo cell.

  • reward — vector containing the rewards for all the agents. these are the rewards generated by the transition from the current state to the next one. each element of the vector must be a numeric scalar.

  • isdone — logical value indicating whether to end the simulation or training episode.

to use additional input arguments beyond the allowed two, define your additional arguments in the matlab workspace, then specify stepfcn as an anonymous function that in turn calls your custom function with the additional arguments defined in the workspace, as shown in the example create custom environment using step and reset functions.

example: stepfcn="mystepfcn"

environment reset function, specified as a function name, function handle, or handle to an anonymous function. the sim function calls your reset function to reset the environment at the start of each simulation, and the train function calls it at the start of each training episode.

the reset function that you provide must have no inputs and two outputs, as illustrated by the following signature.

[initialobservation,info] = myresetfunction()

the reset function sets the environment to an initial state and computes the initial value of the observation. for example, you can create a reset function that randomizes certain state values such that each training episode begins from different initial conditions. the initialobservation must be a cell array containing the initial observations for all the agents. therefore, initialobservation must contain as many elements as the number of agents and each element must match the dimensions and data types specified in the corresponding element of the observationinfo cell.

the info output of resetfcn initializes the info property of your environment and contains any data that you want to pass from one step to the next. this can be the environment state or a structure containing state and parameters. the simulation or training function (train or sim) supplies the current value of info as the second input argument of stepfcn, then uses the fourth output argument returned by stepfcn to update the value of info.

to use additional input arguments beyond the allowed two, define your argument in the matlab workspace, then specify stepfcn as an anonymous function that in turn calls your custom function with the additional arguments defined in the workspace, as shown in the example create custom environment using step and reset functions.

example: resetfcn="myresetfcn"

information to pass to the next step, specified as any matlab data type. this can be the environment state or a structure containing state and parameters. when resetfcn is called, whatever you define as the info output of resetfcn initializes this property. when a step occurs the simulation or training function (train or sim) uses the current value of info as the second input argument for stepfcn. once stepfcn completes, the simulation or training function then updates the current value of info using the fourth output argument returned by stepfcn.

example: info.state=[-1.1 0 2.2]

object functions

getactioninfoobtain action data specifications from reinforcement learning environment, agent, or experience buffer
getobservationinfoobtain observation data specifications from reinforcement learning environment, agent, or experience buffer
traintrain reinforcement learning agents within a specified environment
simsimulate trained reinforcement learning agents within specified environment
validateenvironmentvalidate custom reinforcement learning environment

examples

create a custom multiagent environment by supplying custom matlab® functions. using rlmultiagentfunctionenv, you can create a custom matlab reinforcement learning environment with universal sample time, that is an environment in which all agents execute in the same step. to create your custom turn-based environment, you must define observation specifications, action specifications, and step and reset functions.

for this example, consider an environment containing two agents. the first agent receives an observation belonging to a four-dimensional continuous space and returns an action that can have two values, -1 and 1.

the second agent receives an observation belonging to a mixed observation space with two channels. the first channel carries a two-dimensional continuous vector and the second channel carries a value that is either 0 or 1. the action returned by the second agent is a continuous scalar.

to define the observation and action spaces of the two agents, use cell arrays.

obsinfo = { rlnumericspec([4 1]) , ...
           [rlnumericspec([2 1]) rlfinitesetspec([0 1])] };
actinfo = {rlfinitesetspec([-1 1]), rlnumericspec([1 1])};

next, specify your step and reset functions. for this example, use the functions resetfcn and stepfcn defined at the end of the example.

note that while the custom reset and step functions that you pass to rlmultiagentfuntionenv must have exactly zero and two arguments, respectively, you can avoid this limitation by using anonymous functions. for an example on how to do this, see create custom environment using step and reset functions.

to create the custom multiagent function environment, use rlmultiagentfunctionenv.

env = rlmultiagentfunctionenv( ...
    obsinfo,actinfo, ...
    @stepfcn,@resetfcn)
env = 
  rlmultiagentfunctionenv with properties:
     stepfcn: @stepfcn
    resetfcn: @resetfcn
        info: {[4x1 double]  {1x2 cell}}

you can now create agents for env and train or simulate them as you would for any other environment.

environment functions

environment reset function.

function [initialobs, info] = resetfcn()
% resetfun sets the default state of the environment.
%
% - initialobs is a 1xn cell array (n is the total number of agents).
% - info contains any data that you want to pass between steps.
%
% to pass information from one step to the next, such as the environment 
% state, use info.
% for this example, initialize the agent observations randomly 
% (but set to 1 the value carried by the second observation channel
%  of the second agent).
initialobs = {rand(4,1), {rand(2,1) 1} };
% set the info argument equal to the observation cell. 
info = initialobs;
end

environment step function.

function [nextobs, reward, isdone, info] = stepfcn(action, info)
% stepfun specifies how the environment advances to the next state given
% the actions from all the agents. 
% 
% if n is the total number of agents, then the arguments are as follows.
% - nextobs is a 1xn cell array (s).
% - action is a 1xn cell array.
% - reward is a 1xn numeric array.
% - isdone is a logical or numeric scalar.
% - info contains any data that you want to pass between steps.
% for this example, just return to each agent a random observation 
% multiplied by the norm of its respective action. the second observation 
% channel of the second agent carries a value that can be only be 0 or 1.
nextobs = {  rand([4 1])*norm(action{1}) , ....
             {rand([2 1])*norm(action{2}) 0} };
% return a random reward vector and a false is-done value.
reward = rand(2,1);
isdone = false;
end

version history

introduced in r2023b

网站地图