main content

water tank reinforcement learning environment model -凯发k8网页登录

this example shows how to create a water tank reinforcement learning simulink® environment that contains an rl agent block in the place of a controller for the water level in a tank. to simulate this environment, you must create an agent and specify that agent in the rl agent block. for an example that trains an agent using this environment, see create simulink environment and train agent.

mdl = "rlwatertank";
open_system(mdl)

this model already contains an rl agent block, which connects to the following signals:

  • scalar action output signal

  • vector of observation input signals

  • scalar reward input signal

  • logical input signal for stopping the simulation

actions and observations

a reinforcement learning environment receives action signals from the agent and generates observation signals in response to these actions. to create and train an agent, you must create action and observation specification objects.

the action signal for this environment is the flow rate control signal that is sent to the plant. to create a specification object for an action channel carrying a continuous signal, use the rlnumericspec function.

actioninfo = rlnumericspec([1 1]);
actioninfo.name = "flow";

if the action signal takes one of a discrete set of possible values, create the specification using the rlfinitesetspec function.

for this environment, there are three observation signals sent to the agent, specified as a vector signal. the observation vector is [edteh]t, where:

  • h is the height of the water in the tank.

  • e=r-h, where r is the reference value for the water height.

compute the observation signals in the generate observations subsystem.

open_system(mdl   "/generate observations")

create a three-element vector of observation specifications. specify a lower bound of 0 for the water height, leaving the other observation signals unbounded.

observationinfo = rlnumericspec([3 1],...
    lowerlimit=[-inf -inf 0  ]',...
    upperlimit=[ inf  inf inf]');
observationinfo.name = "observations";
observationinfo.description = "integrated error, error, and measured height";

if the actions or observations are represented by bus signals, create specifications using the bus2rlspec function.

reward signal

construct a scalar reward signal. for this example, specify the following reward.

reward=10(|e|<0.1)-1(|e|0.1)-100(h0||h20)

the reward is positive when the error is below 0.1 and negative otherwise. also, there is a large reward penalty when the water height is outside the 0 to 20 range.

construct this reward in the calculate reward subsystem.

open_system(mdl   "/calculate reward")

stop signal

to terminate training episodes and simulations, specify a logical signal to the isdone input port of the block. for this example, terminate the episode if h0 or h20.

compute this signal in the stop simulation subsystem.

open_system(mdl   "/stop simulation")

create environment object

create an environment object for the simulink model.

env = rlsimulinkenv(mdl,mdl   "/rl agent",observationinfo,actioninfo);

reset function

you can also create a custom reset function that randomizes parameters, variables, or states of the model. in this example, the reset function randomizes the reference signal and the initial water height and sets the corresponding block parameters.

env.resetfcn = @(in)localresetfcn(in);

local function

function in = localresetfcn(in)
% randomize reference signal
h = 3*randn   10;
while h <= 0 || h >= 20
    h = 3*randn   10;
end
in = setblockparameter(in, ...
    "rlwatertank/desired \nwater level", ...
    value=num2str(h));
% randomize initial height
h = 3*randn   10;
while h <= 0 || h >= 20
    h = 3*randn   10;
end
in = setblockparameter(in, ...
    "rlwatertank/water-tank system/h", ...
    initialcondition=num2str(h));
end

see also

functions

related topics

网站地图