log training data to disk -凯发k8网页登录
this example shows how to log custom data to disk when training agents using the reinforcement learning toolbox™ train function.
overview
the general steps for data logging are:
create a data logger object using the
rldatalogger
function.configure the data logger object with callback functions to specify the data to log at different stages of the training process.
specify the logger object as a name-value input argument in the
train
function.
create data logger
create a file data logger object using the rldatalogger
function.
filelogger = rldatalogger()
filelogger = filelogger with properties: loggingoptions: [1x1 rl.logging.option.matfileloggingoptions] episodefinishedfcn: [] agentstepfinishedfcn: [] agentlearnfinishedfcn: []
specify options to log data such as the logging directory and the frequency (in number of episodes) at which the data logger writes data to disk. this step is optional.
% specify a logging directory. you must have write % access for this directory. logdir = fullfile(pwd,"mydatalog"); filelogger.loggingoptions.loggingdirectory = logdir; % specify a naming rule for files. the naming rule episode% saves files as episode001.mat, episode002.mat and so on. filelogger.loggingoptions.filenamerule = "episode " ; % set the frequency (in number of episodes) at which % the data logger writes data to disk filelogger.loggingoptions.datawritefrequency = 1;
configure data logging
training data of interest is generated at different stages of the training loop; for example, experience data is available after the completion of an episode. you can configure the logger object with callback functions to log data at these stages. the functions must return either a structure containing the data to log, or an empty array if no data needs to be logged at that stage.
the callback functions are:
episodefinishedfcn
- callback function to log data such as experiences, logged simulink signals, or initial observations. the trainng loop executes this function after the completion of a training episode. the following is an example for the function.
function datatolog = myepisodeloggingfcn(data) % data is a structure that contains the following fields: % % episodecount: the current episode number. % environment: environment object. % agent: agent object. % experience: a structure containing the experiences % from the current episode. % episodeinfo: a structure containing the fields % cumulativereward, stepstaken, and % initialobservation. % simulationinfo: contains simulation information for the % current episode. % for matlab environments this is a structure % with the field "simulationerror". % for simulink environments this is a % simulink.simulationoutput object. % % datatolog is a structure containing the data to be logged % to disk. % write your code to log data to disk. for example, % datatolog.experience = data.experience; datatolog.experience = data.experience; datatolog.episodereward = data.episodeinfo.cumulativereward; if data.episodeinfo.stepstaken > 0 datatolog.episodeq0 = evaluateq0(data.agent, ... data.episodeinfo.initialobservation); else datatolog.episodeq0 = 0; end
agentstepfinishedfcn
- callback function to log data such as the state of exploration. the trainng loop executes this function after the completion of an agent step within an episode. the following is a example for the function.
function datatolog = myagentsteploggingfcn(data) % data is a structure that contains the following fields: % % episodecount: the current episode number. % agentstepcount: the cumulative number of steps taken by % the agent. % simulationtime: the current simulation time in the % environment. % agent: agent object. % % datatolog is a structure containing the data to be logged % to disk. % write your code to log data to disk. for example, % noisestate = getstate(getexplorationpolicy(data.agent)); % datatolog.noisestate = noisestate; policy = getexplorationpolicy(data.agent); if hasprop(policy,"noisetype") state = getstate(policy); if strcmp(policy.noisetype,"ou") datatolog.ounoise = state.noise{1}; datatolog.standarddeviation = state.standarddeviation{1}; elseif strcmp(policy.noisetype,"gaussian") datatolog.standarddeviation = state.standarddeviation{1}; end else datatolog = []; end
agentlearnfinishedfcn
- callback function to log data such as the actor and critic training losses. the trainng loop executes this function after the completion of the learning subroutine. the following is a example for the function.
function datatolog = myagentlearnloggingfcn(data) % data is a structure that contains the following fields: % % episodecount: the current episode number. % agentstepcount: the cumulative number of steps taken by % the agent. % agentlearncount: the cumulative number of learning steps % taken by the agent. % envmodeltraininginfo: a structure containing the fields: % a. transitionfcnloss % b. rewardfcnloss % c. isdonefcnloss. % this is applicable for model-based % agent training. % agent: agent object. % actorloss: training loss of the actor. % criticloss: training loss of the critic. % % datatolog is a structure containing the data to be logged % to disk. % write your code to log data to disk. for example, % datatolog.actorloss = data.actorloss; datatolog.actorloss = data.actorloss; datatolog.criticloss = data.criticloss;
for this example, configure only the agentlearnfinishedfcn
callback. the function logtrainingloss
logs the actor and critic training losses and is provided at the end of this example.
filelogger.agentlearnfinishedfcn = @logtrainingloss;
run training
create a predefined cartpole-continuous
environment and a deep deterministic policy gradient (ddpg) agent for training.
% set the random seed to facilitate reproducibility rng(0); % create a cartpole-continuous environment env = rlpredefinedenv("cartpole-continuous"); % create a ddpg agent agent = rlddpgagent(getobservationinfo(env), getactioninfo(env)); agent.agentoptions.noiseoptions.standarddeviationdecayrate = 0.001;
specify training options to train the agent for 100 episodes without visualization in the episode manager.
note that you can still use the saveagentcriteria
, saveagentvalue
and saveagentdirectory
options of the rltrainingoptions
object to save the agent duing training. such options do not affect (and are not affected by) any usage of filelogger
or monitorlogger
objects.
trainopts = rltrainingoptions( ... maxepisodes=100, ... plots="training-progress");
train the agent using the train
function. specify the file logger object in the logger
name-value option.
result = train(agent, env, trainopts, logger=filelogger);
the logged data is saved within the directory specified by logdir
.
visualize logged data
you can visualize data logged to disk using the interactive reinforcement learning data viewer graphical user interface. to open the visualization, click view logged data in the reinforcement learning episode manager window.
to create plots in the reinforcement learning data viewer, select a data from the data panel and a choice of plot from the toolstrip. the following image shows a plot of the actorloss data generated using the trend plot type. the plot shows logged data points and a moving average line.
on the toolstrip, navigate to the trend tab to configure plot options. set the window length for averaging data to 50
. the plot updates with the new configuration.
local functions
function datatolog = logtrainingloss(data) % function to log the actor and critic training losses datatolog.actorloss = data.actorloss; datatolog.criticloss = data.criticloss; end
see also
functions
rldatalogger
|train
|sim