train agent or tune environment parameters using parameter sweeping -凯发k8网页登录
this example shows how to train a reinforcement learning agent with the water tank reinforcement learning simulink® environment by sweeping parameters. you can use this example as a template for tuning parameters when training reinforcement learning agents.
open a preconfigured project, which has all required files added as project dependencies. opening the project also launches the experiment manager app.
trainagentusingparametersweepingstart
note that it is best practice to add any simulink models and supporting files as dependencies to your project.
tune agent parameters using parameter sweeping
in this section you tune the agent parameters to search for an optimal training policy.
open experiment
in the experiment browser pane double-click the name of the experiment
(tuneagentparametersexperiment)
. this opens a tab for the experiment.the hyperparameters section contains the hyperparameters to tune for this experiment. a set of hyperparameters has been added for this experiment. to add a new parameter, click add and specify a name and array of values for the hyperparameter. when you run the experiment, experiment manager runs the training using every combination of parameter values specified in the hyperparameter table.
verify that strategy is set to
exhaustive sweep
.under training function, click edit. the matlab editor opens to show code for the training function
tuneagentparameterstraining
. the training function creates the environment and agent objects and runs the training using one combination of the specified hyperparameters.
function output = tuneagentparameterstraining(params,monitor) % set the random seed generator rng(0); % load the simulink model mdl = "rlwatertank"; load_system(mdl); % create variables in the base workspace. when running on a parallel % worker this will also create variables in worker's base workspace. evalin("base", "loadwatertankparams"); ts = evalin("base","ts"); tf = evalin("base","tf"); % create a reinforcement learning environment actioninfo = rlnumericspec([1 1]); observationinfo = rlnumericspec([3 1],... lowerlimit=[-inf -inf 0 ]',... upperlimit=[ inf inf inf]'); blk = mdl "/rl agent"; env = rlsimulinkenv(mdl, blk, observationinfo, actioninfo); % specify a reset function for the environment env.resetfcn = @localresetfcn; % create options for the reinforcement learning agent. you can assign % values from the params structure for sweeping parameters. agentopts = rlddpgagentoptions(); agentopts.minibatchsize = 64; agentopts.targetsmoothfactor = 1e-3; agentopts.sampletime = ts; agentopts.discountfactor = params.discountfactor; agentopts.actoroptimizeroptions.learnrate = params.actorlearnrate; agentopts.criticoptimizeroptions.learnrate = params.criticlearnrate; agentopts.actoroptimizeroptions.gradientthreshold = 1; agentopts.criticoptimizeroptions.gradientthreshold = 1; agentopts.noiseoptions.variance = 0.3; agentopts.noiseoptions.variancedecayrate = 1e-5; % create the reinforcement learning agent. you can modify the % localcreateactorandcritic function to edit the agent model. [actor, critic] = localcreateactorandcritic(observationinfo, actioninfo); agent = rlddpgagent(actor, critic, agentopts); maxepisodes = 200; maxsteps = ceil(tf/ts); trainopts = rltrainingoptions(... maxepisodes=maxepisodes, ... maxstepsperepisode=maxsteps, ... scoreaveragingwindowlength=20, ... verbose=false, ... plots="none",... stoptrainingcriteria="averagereward",... stoptrainingvalue=800); % create a data logger for logging data to the monitor object logger = rldatalogger(monitor); % run the training result = train(agent, env, trainopts, logger=logger); % export experiment results output.agent = agent; output.environment = env; output.trainingresult = result; output.parameters = params; end
run experiment
when you run the experiment, experiment manager executes the training function multiple times. each trial uses one combination of hyperparameter values. by default, experiment manager runs one trial at a time. if you have the parallel computing toolbox ™, you can run multiple trials at the same time or offload your experiment as a batch job in a cluster.
under mode, select sequential
, and click run to run the experiment one trial at a time.
to run multiple trials simultaneously, under mode, select
simultaneous
, and click run. this requires a parallel computing toolbox license.to offload the experiment as a batch job under mode, select
batch sequential
orbatch simultaneous
, specify your cluster and pool size, and click run. note that you will need to configure the cluster with the files necessary for this example. this mode also requires a parallel computing toolbox license.
note that your cluster needs to be configured with files necessary for this experiment when running in the batch sequential
or batch simultaneous
modes. for more information on the cluster profile manager, see discover clusters and use cluster profiles (parallel computing toolbox). to configure your cluster:
open the cluster profile manager and under properties, click edit.
under the
attachedfiles
option, click add and specify the filesrlwatertank.slx
andloadwatertankparams.m
.click done.
when the experiment is running, select a trial row from the table of results, and under the toolstrip, click training plot. this shows the episode and average reward plots for that trial.
after the experiment is finished:
select the row corresponding to trial 7 which has the average reward
817.5
, and under the toolstrip, click export. this action exports the results of the trial to a base workspace variable.name the variable
agentparamsweeptrainingoutput
.
tune environment parameters using parameter sweeping
in this section you tune the environment's reward function parameters to search for an optimal training policy.
open experiment
in the experiment browser pane, open tuneenvironmentparametersexperiment
. verify, as with the agent tuning, that strategy is set to exhaustive sweep
. view code for the training function tuneenvironmentparameterstraining
as before.
function output = tuneenvironmentparameterstraining(params,monitor) % set the random seed generator rng(0); % load the simulink model mdl = "rlwatertank"; load_system(mdl); % create variables in the base workspace. when running on a parallel % worker this will also create variables in the worker's base workspace. evalin("base", "loadwatertankparams"); ts = evalin("base","ts"); tf = evalin("base","tf"); % create a reinforcement learning environment actioninfo = rlnumericspec([1 1]); observationinfo = rlnumericspec([3 1],... lowerlimit=[-inf -inf 0 ]',... upperlimit=[ inf inf inf]'); blk = mdl "/rl agent"; env = rlsimulinkenv(mdl, blk, observationinfo, actioninfo); % specify a reset function for the environment. you can tune environment % parameters such as reward or initial condition within this function. env.resetfcn = @(in) localresetfcn(in, params); % create options for the reinforcement learning agent. you can assign % values from the params structure for sweeping parameters. agentopts = rlddpgagentoptions(); agentopts.minibatchsize = 64; agentopts.targetsmoothfactor = 1e-3; agentopts.sampletime = ts; agentopts.discountfactor = 0.99; agentopts.actoroptimizeroptions.learnrate = 1e-3; agentopts.criticoptimizeroptions.learnrate = 1e-3; agentopts.actoroptimizeroptions.gradientthreshold = 1; agentopts.criticoptimizeroptions.gradientthreshold = 1; agentopts.noiseoptions.variance = 0.3; agentopts.noiseoptions.variancedecayrate = 1e-5; % create the reinforcement learning agent. you can modify the % localcreateactorandcritic function to edit the agent model. [actor, critic] = localcreateactorandcritic(observationinfo, actioninfo); agent = rlddpgagent(actor, critic, agentopts); maxepisodes = 200; maxsteps = ceil(tf/ts); trainopts = rltrainingoptions(... maxepisodes=maxepisodes, ... maxstepsperepisode=maxsteps, ... scoreaveragingwindowlength=20, ... verbose=false, ... plots="none",... stoptrainingcriteria="averagereward",... stoptrainingvalue=800); % create a data logger for logging data to the monitor object logger = rldatalogger(monitor); % run the training result = train(agent, env, trainopts, logger=logger); % export experiment results output.agent = agent; output.environment = env; output.trainingresult = result; output.parameters = params; end %% environment reset function function in = localresetfcn(in, params) % randomize reference signal blk = sprintf("rlwatertank/desired \nwater level"); h = 3*randn 10; while h <= 0 || h >= 20 h = 3*randn 10; end in = setblockparameter(in,blk,"value",num2str(h)); % randomize initial height h = 3*randn 10; while h <= 0 || h >= 20 h = 3*randn 10; end blk = "rlwatertank/water-tank system/h"; in = setblockparameter(in,blk,"initialcondition",num2str(h)); % tune the reward parameters in = setblockparameter(in, ... "rlwatertank/calculate reward/gain", ... "gain",num2str(params.rewardgain)); in = setblockparameter(in, ... "rlwatertank/calculate reward/gain2", ... "gain",num2str(params.exceedsboundspenalty)); end
run experiment
run the experiment using the same settings you use for agent tuning.
after the experiment is finished:
select the row corresponding to trial 4, which has the maximum average reward, and export the result to a base workspace variable.
name the variable as
envparamsweeptrainingoutput
.
evaluate agent performance
execute the following code in matlab after exporting the agents from the above experiments. this code simulates the agent with the environment and displays the performance in the scope blocks.
open_system("rlwatertank"); simopts = rlsimulationoptions(maxsteps=200); % evaluate the agent exported from % tuneagentparametersexperiment experience = sim(agentparamsweeptrainingoutput.agent, ... agentparamsweeptrainingoutput.environment, ... simopts); % evaluate the agent exported from % tuneenvironmentparametersexperiment experience = sim(envparamsweeptrainingoutput.agent, ... envparamsweeptrainingoutput.environment, simopts);
the agent is able to track the desired water level.
close the project.
close(prj);