this example shows how to generate a reinforcement learning reward function from a simulink design optimization model verification block.
for this example, open the simulink model levelcheckblock.slx
, which contains a check step response characteristics block named level check.
generate the reward function code from specifications in the level check block, using generaterewardfunction
. the code is displayed in the matlab editor.
generaterewardfunction("levelcheckblock/level check")
for this example, the code is saved in the matlab function file myblockrewardfcn.m
.
display the generated reward function.
function reward = myblockrewardfcn(x,t)
% myblockrewardfcn generates rewards from simulink block specifications.
%
% x : input of levelcheckblock/level check
% t : simulation time (s)
% reinforcement learning toolbox
% 27-may-2021 16:45:27
%#codegen
%% specifications from levelcheckblock/level check
block1_initialvalue = 1;
block1_finalvalue = 2;
block1_steptime = 0;
block1_steprange = block1_finalvalue - block1_initialvalue;
block1_minrise = block1_initialvalue block1_steprange * 80/100;
block1_maxsettling = block1_initialvalue block1_steprange * (1 2/100);
block1_minsettling = block1_initialvalue block1_steprange * (1-2/100);
block1_maxovershoot = block1_initialvalue block1_steprange * (1 10/100);
block1_minundershoot = block1_initialvalue - block1_steprange * 5/100;
if t >= block1_steptime
if block1_initialvalue <= block1_finalvalue
block1_upperboundtimes = [0,5; 5,max(5 1,t 1)];
block1_upperboundamplitudes = [block1_maxovershoot
block1_maxovershoot;
block1_maxsettling
block1_maxsettling];
block1_lowerboundtimes = [0,2; 2,5; 5,max(5 1,t 1)];
block1_lowerboundamplitudes = [block1_minundershoot
block1_minundershoot;
block1_minrise
block1_minrise;
block1_minsettling
block1_minsettling];
else
block1_upperboundtimes = [0,2; 2,5; 5,max(5 1,t 1)];
block1_upperboundamplitudes = [block1_minundershoot
block1_minundershoot;
block1_minrise,block1_minrise;
block1_minsettling
block1_minsettling];
block1_lowerboundtimes = [0,5; 5,max(5 1,t 1)];
block1_lowerboundamplitudes = [block1_maxovershoot
block1_maxovershoot;
block1_maxsettling
block1_maxsettling];
end
block1_xmax = zeros(1,size(block1_upperboundtimes,1));
for idx = 1:numel(block1_xmax)
tseg = block1_upperboundtimes(idx,:);
xseg = block1_upperboundamplitudes(idx,:);
block1_xmax(idx) = interp1(tseg,xseg,t,'linear',nan);
end
if all(isnan(block1_xmax))
block1_xmax = inf;
else
block1_xmax = max(block1_xmax,[],'omitnan');
end
block1_xmin = zeros(1,size(block1_lowerboundtimes,1));
for idx = 1:numel(block1_xmin)
tseg = block1_lowerboundtimes(idx,:);
xseg = block1_lowerboundamplitudes(idx,:);
block1_xmin(idx) = interp1(tseg,xseg,t,'linear',nan);
end
if all(isnan(block1_xmin))
block1_xmin = -inf;
else
block1_xmin = max(block1_xmin,[],'omitnan');
end
else
block1_xmin = -inf;
block1_xmax = inf;
end
%% penalty function weight (specify nonnegative)
weight = 1;
%% compute penalty
% penalty is computed for violation of linear bound constraints.
%
% to compute exterior bound penalty, use the exteriorpenalty function and
% specify the penalty method as 'step' or 'quadratic'.
%
% alternaltively, use the hyperbolicpenalty or barrierpenalty function for
% computing hyperbolic and barrier penalties.
%
% for more information, see help for these functions.
penalty = sum(exteriorpenalty(x,block1_xmin,block1_xmax,'step'));
%% compute reward
reward = -weight * penalty;
end
the generated reward function takes as input arguments the current value of the verification block input signals and the simulation time. a negative reward is calculated using a weighted penalty that acts whenever the current block input signals violate the linear bound constraints defined in the verification block.
the generated reward function is a starting point for reward design. you can tune the weights or use a different penalty function to define a more appropriate reward for your reinforcement learning agent.