simulate reinforcement learning environment against policy or agent -凯发k8网页登录
simulate reinforcement learning environment against policy or agent
since r2022a
syntax
description
specifies nondefault simulation options using one or more name-value arguments.output
= runepisode(___,name=value
)
examples
simulate environment and agent
create a reinforcement learning environment and extract its observation and action specifications.
env = rlpredefinedenv("cartpole-discrete");
obsinfo = getobservationinfo(env);
actinfo = getactioninfo(env);
to approximate the q-value function withing the critic, use a neural network. create a network as an array of layer objects.
net = [...
featureinputlayer(obsinfo.dimension(1))
fullyconnectedlayer(24)
relulayer
fullyconnectedlayer(24)
relulayer
fullyconnectedlayer(2)
softmaxlayer];
convert the network to a dlnetwork
object and display the number of learnable parameters (weights).
net = dlnetwork(net); summary(net)
initialized: true number of learnables: 770 inputs: 1 'input' 4 features
create a discrete categorical actor using the network.
actor = rldiscretecategoricalactor(net,obsinfo,actinfo);
check your actor with a random observation.
act = getaction(actor,{rand(obsinfo.dimension)})
act = 1x1 cell array
{[-10]}
create a policy object from the actor.
policy = rlstochasticactorpolicy(actor);
create an experience buffer.
buffer = rlreplaymemory(obsinfo,actinfo);
set up the environment for running multiple simulations. for this example, configure the training to log any errors rather than send them to the command window.
setup(env,stoponerror="off")
simulate multiple episodes using the environment and policy. after each episode, append the experiences to the buffer. for this example, run 100 episodes.
for i = 1:100 output = runepisode(env,policy,maxsteps=300); append(buffer,output.agentdata.experiences) end
clean up the environment.
cleanup(env)
sample a mini-batch of experiences from the buffer. for this example, sample 10 experiences.
batch = sample(buffer,10);
you can then learn from the sampled experiences and update the policy and actor.
input arguments
env
— reinforcement learning environment
environment object | ...
reinforcement learning environment, specified as one of the following objects.
rlfunctionenv
— environment defined using custom functionssimulinkenvwithagent
— simulink® environment created usingrlsimulinkenv
orcreateintegratedenv
rlmdpenv
— markov decision process environmentrlneuralnetworkenvironment
— environment with deep neural network transition modelspredefined environment created using
rlpredefinedenv
custom environment created from a template (
rlcreateenvtemplate
)
policy
— policy
policy object | array of policy objects
policy object, specified as one of the following objects.
rldeterministicactorpolicy
rladditivenoisepolicy
rlepsilongreedypolicy
rlmaxqpolicy
rlstochasticactorpolicy
if env
is a simulink environment configured for multi-agent training, specify
policy
as an array of policy objects. the order of the policies
in the array must match the agent order used to create env
.
for more information on a policy object, at the matlab® command line, type help
followed by the policy object
name.
agent
— reinforcement learning agent
agent object | array of agent objects
reinforcement learning agent, specified as one of the following objects.
custom agent — for more information, see .
if env
is a simulink environment configured for multi-agent training, specify
agent
as an array of agent objects. the order of the agents in
the array must match the agent order used to create env
.
name-value arguments
specify optional pairs of arguments as
name1=value1,...,namen=valuen
, where name
is
the argument name and value
is the corresponding value.
name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
example: maxsteps=1000
maxsteps
— maximum simulation steps
500
(default) | positive integer
maximum simulation steps, specified as a positive integer.
processexperiencefcn
— function for processing experiences
function handle | cell array of function handles
function for processing experiences and updating the policy or agent based on each experience as it occurs during the simulation, specified as a function handle with the following signature.
[updatedpolicy,updateddata] = myfcn(experience,episodeinfo,policy,data)
here:
experience
is a structure that contains a single experience. for more information on the structure fields, seeoutput.experiences
.episodeinfo
contains data about the current episode and corresponds tooutput.episodeinfo
.policy
is the policy or agent object being simulated.data
contains experience processing data. for more information, seeprocessexperiencedata
.updatedpolicy
is the updated policy or agent.updateddata
is the updated experience processing data, which is used as thedata
input when processing the next experience.
if env
is a simulink environment configured for multi-agent training, specify
processexperiencefcn
as a cell array of function handles. the
order of the function handles in the array must match the agent order used to create
env
.
processexperiencedata
— experience processing data
any matlab data type | cell array
experience processing data, specified as any matlab data, such as an array or structure. use this data to pass additional parameters or information to the experience processing function.
you can also update this data within the experience processing function to use
different parameters when processing the next experience. the data values that you
specify when you call runepisode
are used to process the first
experience in the simulation.
if env
is a simulink environment configured for multi-agent training, specify
processexperiencedata
as a cell array. the order of the array
elements must match the agent order used to create env
.
cleanuppostsim
— option to clean up environment
true
(default) | false
option to clean up the environment after the simulation, specified as
true
or false
. when
cleanuppostsim
is true
,
runepisode
calls cleanup(env)
when the
simulation ends.
to run multiple episodes without cleaning up the environment, set
cleanuppostsim
to false
. you can then call
cleanup(env)
after running your simulations.
if env
is a simulinkenvwithagent
object and
the associated simulink model is configured to use fast restart, then the model remains in a
compiled state between simulations when cleanuppostsim
is
false
.
logexperiences
— option to log experiences
true
(default) | false
option to log experiences for each policy or agent, specified as
true
or false
. when
logexperiences
is true
, the experiences of
the policy or agent are logged in output.experiences
.
output arguments
output
— simulation output
structure | future
object
simulation output, returned as a structure with the fields
agentdata
and simulationinfo
.
the agentdata
field is a structure array containing data for each
agent or policy. each agentdata
structure has the following
fields.
field | description |
---|---|
experiences | logged experience of the policy or agent, returned as a structure array. each experience contains the following fields.
|
time | simulation times of experiences, returned as a vector. |
episodeinfo | episode information, returned as a structure with the following fields.
|
processexperiencedata | experience processing data |
agent | policy or agent used in the simulation |
the simulationinfo
field is one of the following:
for matlab environments — structure containing the field
simulationerror
. this structure contains any errors that occurred during simulation.for simulink environments —
simulink.simulationoutput
object containing simulation data. recorded data includes any signals and states that the model is configured to log, simulation metadata, and any errors that occurred.
if env
is configured to run simulations on parallel workers,
then output
is a future
object,
which supports deferred outputs for environment simulations that run on workers.
tips
you can speed up episode simulation by using parallel computing. to do so, use the
setup
function and set theuseparallel
argument totrue
.setup(env,useparallel=true)
version history
introduced in r2022a
see also
objects
functions
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
select a web site
choose a web site to get translated content where available and see local events and offers. based on your location, we recommend that you select: .
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.
americas
- (español)
- (english)
- (english)
europe
- (english)
- (english)
- (deutsch)
- (español)
- (english)
- (français)
- (english)
- (italiano)
- (english)
- (english)
- (english)
- (deutsch)
- (english)
- (english)
- switzerland
- (english)
asia pacific
- (english)
- (english)
- (english)
- 中国
- (日本語)
- (한국어)