water tank reinforcement learning environment model -凯发k8网页登录
this example shows how to create a water tank reinforcement learning simulink® environment that contains an rl agent block in the place of a controller for the water level in a tank. to simulate this environment, you must create an agent and specify that agent in the rl agent block. for an example that trains an agent using this environment, see create simulink environment and train agent.
mdl = "rlwatertank";
open_system(mdl)
this model already contains an rl agent block, which connects to the following signals:
scalar action output signal
vector of observation input signals
scalar reward input signal
logical input signal for stopping the simulation
actions and observations
a reinforcement learning environment receives action signals from the agent and generates observation signals in response to these actions. to create and train an agent, you must create action and observation specification objects.
the action signal for this environment is the flow rate control signal that is sent to the plant. to create a specification object for an action channel carrying a continuous signal, use the rlnumericspec
function.
actioninfo = rlnumericspec([1 1]);
actioninfo.name = "flow";
if the action signal takes one of a discrete set of possible values, create the specification using the rlfinitesetspec
function.
for this environment, there are three observation signals sent to the agent, specified as a vector signal. the observation vector is , where:
is the height of the water in the tank.
, where is the reference value for the water height.
compute the observation signals in the generate observations subsystem.
open_system(mdl "/generate observations")
create a three-element vector of observation specifications. specify a lower bound of 0 for the water height, leaving the other observation signals unbounded.
observationinfo = rlnumericspec([3 1],... lowerlimit=[-inf -inf 0 ]',... upperlimit=[ inf inf inf]'); observationinfo.name = "observations"; observationinfo.description = "integrated error, error, and measured height";
if the actions or observations are represented by bus signals, create specifications using the bus2rlspec
function.
reward signal
construct a scalar reward signal. for this example, specify the following reward.
the reward is positive when the error is below 0.1
and negative otherwise. also, there is a large reward penalty when the water height is outside the 0
to 20
range.
construct this reward in the calculate reward subsystem.
open_system(mdl "/calculate reward")
stop signal
to terminate training episodes and simulations, specify a logical signal to the isdone
input port of the block. for this example, terminate the episode if or .
compute this signal in the stop simulation subsystem.
open_system(mdl "/stop simulation")
create environment object
create an environment object for the simulink model.
env = rlsimulinkenv(mdl,mdl "/rl agent",observationinfo,actioninfo);
reset function
you can also create a custom reset function that randomizes parameters, variables, or states of the model. in this example, the reset function randomizes the reference signal and the initial water height and sets the corresponding block parameters.
env.resetfcn = @(in)localresetfcn(in);
local function
function in = localresetfcn(in) % randomize reference signal h = 3*randn 10; while h <= 0 || h >= 20 h = 3*randn 10; end in = setblockparameter(in, ... "rlwatertank/desired \nwater level", ... value=num2str(h)); % randomize initial height h = 3*randn 10; while h <= 0 || h >= 20 h = 3*randn 10; end in = setblockparameter(in, ... "rlwatertank/water-tank system/h", ... initialcondition=num2str(h)); end