main content

log training data to disk -凯发k8网页登录

this example shows how to log custom data to disk when training agents using the reinforcement learning toolbox™ train function.

overview

the general steps for data logging are:

  1. create a data logger object using the rldatalogger function.

  2. configure the data logger object with callback functions to specify the data to log at different stages of the training process.

  3. specify the logger object as a name-value input argument in the train function.

create data logger

create a file data logger object using the rldatalogger function.

filelogger = rldatalogger()
filelogger = 
  filelogger with properties:
           loggingoptions: [1x1 rl.logging.option.matfileloggingoptions]
       episodefinishedfcn: []
     agentstepfinishedfcn: []
    agentlearnfinishedfcn: []

specify options to log data such as the logging directory and the frequency (in number of episodes) at which the data logger writes data to disk. this step is optional.

% specify a logging directory. you must have write 
% access for this directory.
logdir = fullfile(pwd,"mydatalog");
filelogger.loggingoptions.loggingdirectory = logdir;
% specify a naming rule for files. the naming rule episode
% saves files as episode001.mat, episode002.mat and so on.
filelogger.loggingoptions.filenamerule = "episode";
% set the frequency (in number of episodes) at which 
% the data logger writes data to disk
filelogger.loggingoptions.datawritefrequency = 1;

configure data logging

training data of interest is generated at different stages of the training loop; for example, experience data is available after the completion of an episode. you can configure the logger object with callback functions to log data at these stages. the functions must return either a structure containing the data to log, or an empty array if no data needs to be logged at that stage.

the callback functions are:

  • episodefinishedfcn - callback function to log data such as experiences, logged simulink signals, or initial observations. the trainng loop executes this function after the completion of a training episode. the following is an example for the function.

function datatolog = myepisodeloggingfcn(data)
% data is a structure that contains the following fields:
%
% episodecount: the current episode number.
% environment: environment object.
% agent: agent object.
% experience: a structure containing the experiences 
%             from the current episode.
% episodeinfo: a structure containing the fields 
%              cumulativereward, stepstaken, and 
%              initialobservation.
% simulationinfo: contains simulation information for the 
%                 current episode.
%                 for matlab environments this is a structure 
%                 with the field "simulationerror".
%                 for simulink environments this is a 
%                 simulink.simulationoutput object.
%
% datatolog is a structure containing the data to be logged 
% to disk.
% write your code to log data to disk. for example, 
% datatolog.experience = data.experience;
datatolog.experience = data.experience;
datatolog.episodereward = data.episodeinfo.cumulativereward;
if data.episodeinfo.stepstaken > 0
    datatolog.episodeq0 = evaluateq0(data.agent, ...
        data.episodeinfo.initialobservation);
else
    datatolog.episodeq0 = 0;
end
  • agentstepfinishedfcn - callback function to log data such as the state of exploration. the trainng loop executes this function after the completion of an agent step within an episode. the following is a example for the function.

function datatolog = myagentsteploggingfcn(data)
% data is a structure that contains the following fields:
%
% episodecount:   the current episode number.
% agentstepcount: the cumulative number of steps taken by 
%                 the agent.
% simulationtime: the current simulation time in the 
%                 environment.
% agent:          agent object.
%
% datatolog is a structure containing the data to be logged 
% to disk.
% write your code to log data to disk. for example, 
% noisestate = getstate(getexplorationpolicy(data.agent));
% datatolog.noisestate = noisestate;
policy = getexplorationpolicy(data.agent);
if hasprop(policy,"noisetype")
    state = getstate(policy);
    if strcmp(policy.noisetype,"ou")
        datatolog.ounoise = state.noise{1};
        datatolog.standarddeviation = state.standarddeviation{1};
    elseif strcmp(policy.noisetype,"gaussian")
        datatolog.standarddeviation = state.standarddeviation{1};
    end
else
    datatolog = [];
end
  • agentlearnfinishedfcn - callback function to log data such as the actor and critic training losses. the trainng loop executes this function after the completion of the learning subroutine. the following is a example for the function.

function datatolog = myagentlearnloggingfcn(data)
% data is a structure that contains the following fields:
%
% episodecount:   the current episode number.
% agentstepcount: the cumulative number of steps taken by 
%                 the agent.
% agentlearncount: the cumulative number of learning steps 
%                  taken by the agent.
% envmodeltraininginfo: a structure containing the fields: 
%                       a. transitionfcnloss
%                       b. rewardfcnloss
%                       c. isdonefcnloss. 
%                       this is applicable for model-based 
%                       agent training.
% agent: agent object.
% actorloss: training loss of the actor.
% criticloss: training loss of the critic.
%
% datatolog is a structure containing the data to be logged 
% to disk.
% write your code to log data to disk. for example, 
% datatolog.actorloss = data.actorloss;
datatolog.actorloss  = data.actorloss;
datatolog.criticloss = data.criticloss;

for this example, configure only the agentlearnfinishedfcn callback. the function logtrainingloss logs the actor and critic training losses and is provided at the end of this example.

filelogger.agentlearnfinishedfcn = @logtrainingloss;

run training

create a predefined cartpole-continuous environment and a deep deterministic policy gradient (ddpg) agent for training.

% set the random seed to facilitate reproducibility
rng(0);
% create a cartpole-continuous environment
env = rlpredefinedenv("cartpole-continuous");
% create a ddpg agent
agent = rlddpgagent(getobservationinfo(env), getactioninfo(env));
agent.agentoptions.noiseoptions.standarddeviationdecayrate = 0.001;

specify training options to train the agent for 100 episodes without visualization in the episode manager.

note that you can still use the saveagentcriteria, saveagentvalue and saveagentdirectory options of the rltrainingoptions object to save the agent duing training. such options do not affect (and are not affected by) any usage of filelogger or monitorlogger objects.

trainopts = rltrainingoptions( ...
    maxepisodes=100, ...
    plots="training-progress");

train the agent using the train function. specify the file logger object in the logger name-value option.

result = train(agent, env, trainopts, logger=filelogger);

the logged data is saved within the directory specified by logdir.

visualize logged data

you can visualize data logged to disk using the interactive reinforcement learning data viewer graphical user interface. to open the visualization, click view logged data in the reinforcement learning episode manager window.

to create plots in the reinforcement learning data viewer, select a data from the data panel and a choice of plot from the toolstrip. the following image shows a plot of the actorloss data generated using the trend plot type. the plot shows logged data points and a moving average line.

on the toolstrip, navigate to the trend tab to configure plot options. set the window length for averaging data to 50. the plot updates with the new configuration.

local functions

function datatolog = logtrainingloss(data)
% function to log the actor and critic training losses
datatolog.actorloss = data.actorloss;
datatolog.criticloss = data.criticloss;
end

see also

functions

objects

related examples

more about

网站地图