agents -凯发k8网页登录
a reinforcement learning agent receives observations and a reward from the environment. using its policy, the agent selects an action based on the observations and reward, and returns the action to the environment. during training, the agent continuously updates the policy parameters based on the action, observations, and reward. doing so, allows the agent to learn the optimal policy for the given environment and reward signal.
reinforcement learning toolbox™ software provides reinforcement learning agents that use several common algorithms, such as sarsa, dqn, ddpg, and ppo. you can also implement other agent algorithms by creating your own custom agents.
for more information, see reinforcement learning agents. for more information on defining policy representations, see .
apps
reinforcement learning designer | design, train, and simulate reinforcement learning agents |
blocks
rl agent | reinforcement learning agent |
functions
topics
agent basics
- reinforcement learning agents
you can create an agent using one of several standard reinforcement learning algorithms or define your own custom agent.
interactively create or import agents for training using the reinforcement learning designer app.
agent types
create q-learning agents for reinforcement learning.
create sarsa agents for reinforcement learning.
create dqn agents for reinforcement learning.
create policy gradient agents for reinforcement learning.
create ddpg agents for reinforcement learning.- twin-delayed deep deterministic (td3) policy gradient agents
create td3 agents for reinforcement learning.
create actor-critic agents for reinforcement learning.
create ppo agents for reinforcement learning.
create trpo agents for reinforcement learning.
create sac agents for reinforcement learning.
a model-based (mbpo) reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.
custom agents
- create custom reinforcement learning agents
create agents that use custom reinforcement learning algorithms.