design and train agent using reinforcement learning designer -凯发k8网页登录

design and train agent using reinforcement learning designer

this example shows how to design and train a dqn agent for an environment with a discrete action space using reinforcement learning designer.

open the reinforcement learning designer app

open the reinforcement learning designer app.

reinforcementlearningdesigner

initial reinforcement learning designer window. the left panes show no loaded agents, environments, results, or previews

initially, no agents or environments are loaded in the app.

import cart-pole environment

when using the reinforcement learning designer, you can import an environment from the matlab^® workspace or create a predefined environment. for more information, see and .

for this example, use the predefined discrete cart-pole matlab environment. to import this environment, on the reinforcement learning tab, in the environments section, select new > discrete cart-pole.

reinforcement learning designer window with expanded

in the environments pane, the app adds the imported discrete cartpole environment. to rename the environment, click the environment text. you can also import multiple environments in the session.

to view the dimensions of the observation and action space, click the environment text. the app shows the dimensions in the preview pane.

the preview pane shows the dimensions of the state and action spaces being [4 1] and [1 1], respectively

this environment has a continuous four-dimensional observation space (the positions and velocities of both the cart and pole) and a discrete one-dimensional action space consisting of two possible forces, –10n or 10n. this environment is used in the train dqn agent to balance cart-pole system example. for more information on predefined control system environments, see .

create dqn agent for imported environment

to create an agent, on the reinforcement learning tab, in the agent section, click new. in the create agent dialog box, specify the agent name, the environment, and the training algorithm. the default agent configuration uses the imported environment and the dqn algorithm. for this example, change the number of hidden units from 256 to 20. for more information on creating agents, see .

create agent dialog box

click ok.

the app adds the new agent to the agents pane and opens a corresponding agent1 document.

in the hyperparameter section, under critic optimizer options set learn rate to 0.0001.

reinforcement learning designer with the agent window open

for a brief summary of dqn agent features and to view the observation and action specifications for the agent, click overview.

reinforcement learning designer with the agent window open, showing the overview section

when you create a dqn agent in reinforcement learning designer, the agent uses a default deep neural network structure for its critic. to view the critic network, on the dqn agent tab, click view critic model.

the deep learning network analyzer opens and displays the critic structure.

deep learning network analyzer showing the deep neural network used in the critic

close the deep learning network analyzer.

train agent

to train your agent, on the train tab, first specify options for training the agent. for information on specifying training options, see specify simulation options in reinforcement learning designer.

for this example, specify the maximum number of training episodes by setting max episodes to 1000. for the other training options, use their default values. the default criteria for stopping is when the average number of steps per episode (over the last 5 episodes) is greater than 500.

reinforcement learning designer app showing the train tab in the toolstrip

to start training, click train.

during training, the app opens the training session tab and displays the training progress in the training results document.

at any time during training, you can click on the stop or stop training buttons to interrupt training and perform other operations on the command line.

reinforcement learning designer with training stopped

at this point the resume, accept, and cancel buttons in the training session tab give you the option to resume the training, accept the training results (which stores the training results and the trained agent in the app) or cancel the training altogether, respectively.

to resume training click resume.

reinforcement learning designer after agent training

here, the training stops when the average number of steps per episode is 500.

to accept the training results click accept. in the agents pane, the app adds the trained agent, agent1_trained.

simulate agent and inspect simulation results

to simulate the trained agent, on the simulate tab, first select agent1_trained in the agent drop-down list, then configure the simulation options. for this example, use the default number of episodes (10) and maximum episode length (500). for more information on specifying simulation options, see specify simulation options in reinforcement learning designer.

simulation toolstrip tab

to simulate the agent, click simulate.

the app opens the simulation session tab. after the simulation is completed, the simulation results document shows the reward for each episode as well as the reward mean and standard deviation.

simulation result document showing the reward of each simulation episode, together with their mean and standard deviation

for three episodes the agent was not able to reach the maximum reward of 500. this suggests that the robustness of the trained agent to different initial conditions might be improved. in this case, training the agent longer, for example by selecting an average window length of 10 instead of 5, yields better robustness. you can also modify some dqn agent options such as batchsize and targetupdatefrequency to promote faster and more robust learning.

to analyze the simulation results, click inspect simulation data. this opens the simulation data inspector. for more information, see simulation data inspector (simulink).

you also have the option to preemptively clear from the simulation data inspector any data that you might have loaded in a previous session. to do so, under inspect simulation data, select clear and inspect simulation data.

simulation toolstrip tab showing the option to clear and inspect simulation data

in the simulation data inspector you can view the saved signals for each simulation episode.

by default, the upper plot area is selected. to show the first state (the cart position), during the first episode, under run 1: simulation result, open the cartpolestates variable, and select cartpolestates(1,1). the cart goes outside the boundary after about 390 seconds, causing the simulation to terminate.

simulation data inspector showing the position of the cart in the first simulation episode

to also show the reward in the upper plot area, select the reward variable. note that the units on the vertical axis change accordingly.

click the middle plot area, and select the third state (pole angle). then click the bottom area and select the second and fourth state (cart velocity and pole angle derivative).

simulation data inspector showing a variety of simulation results in different plot areas

for a related example, in which a dqn agent is trained on the same environment, see train dqn agent to balance cart-pole system.

close the simulation data inspector.

to accept the simulation results, on the simulation session tab, click accept.

in the results pane, the app adds the simulation results structure, experience1.

export agent and save session

to select the trained agent and open the corresponding agent1_trained document, under the agents pane, double click on agent1_trained.

reinforcement learning designer window showing how to export the trained agent

then, to export the trained agent to the matlab workspace, on the reinforcement learning tab, under export, select the trained agent.

reinforcement learning designer window showing how to export the trained agent

to save the app session, on the reinforcement learning tab, click save session. in the future, to resume your work where you left off, you can open the session in reinforcement learning designer.

simulate agent at the command line

to simulate the agent at the matlab command line, first load the cart-pole environment.

env = rlpredefinedenv("cartpole-discrete");

the cart-pole environment has an environment visualizer that allows you to see how the system behaves during simulation and training.

plot the environment and perform a simulation using the trained agent that you previously exported from the app.

plot(env)
xpr2 = sim(env,agent1_trained);

during the simulation, the visualizer shows the movement of the cart and pole. in this simulation, the trained agent is able to stabilize the system.

cart pole environment visualizer showing the pole stabilized on the cart

finally, display the cumulative reward for the simulation.

sum(xpr2.reward)

env = 
   500

as expected, the cumulative reward is 500.

related examples

train dqn agent to balance cart-pole system

design and train agent using reinforcement learning designer -凯发k8网页登录