main content

policies and value functions -凯发k8网页登录

define policy and value function approximators, such as actors and critics

during training, most agents rely on an actor, a critic, or both. the actor learns the policy that selects the action to take. the critic learns the value (or q-value) function that estimates the value of a policy.

reinforcement learning toolbox™ provides function approximator objects for actors and critics, and policy objects for custom loops and deployment. approximator objects can internally use different approximation models, such as deep neural networks, linear basis functions, or look-up tables.

for an introduction to policies, value functions, actors and critics, see create policies and value functions.

blocks

policyreinforcement learning policy (since r2022b)

functions

value table or q table (since r2019a)
value function approximator object for reinforcement learning agents (since r2022a)
rlqvaluefunction q-value function approximator object for reinforcement learning agents (since r2022a)
vector q-value function approximator for reinforcement learning agents (since r2022a)
rlcontinuousdeterministicactor deterministic actor with a continuous action space for reinforcement learning agents (since r2022a)
stochastic categorical actor with a discrete action space for reinforcement learning agents (since r2022a)
stochastic gaussian actor with a continuous action space for reinforcement learning agents (since r2022a)
extract actor from reinforcement learning agent (since r2019a)
set actor of reinforcement learning agent (since r2019a)
extract critic from reinforcement learning agent (since r2019a)
set critic of reinforcement learning agent (since r2019a)
get approximation model from function approximator object (since r2020b)
set approximation model in function approximator object (since r2020b)
obtain learnable parameter values from agent, function approximator, or policy object (since r2019a)
set learnable parameter values of agent, function approximator, or policy object (since r2019a)
rloptimizeroptionsoptimization options for actors and critics (since r2022a)
extract greedy (deterministic) policy object from agent (since r2022a)
extract exploratory (stochastic) policy object from agent (since r2023a)
policy object to generate discrete max-q actions for custom training loops and application deployment (since r2022a)
policy object to generate discrete epsilon-greedy actions for custom training loops (since r2022a)
policy object to generate continuous deterministic actions for custom training loops and application deployment (since r2022a)
policy object to generate continuous noisy actions for custom training loops (since r2022a)
policy object to generate stochastic actions for custom training loops and application deployment (since r2022a)
obtain action from agent, actor, or policy object given environment observations (since r2020a)
obtain estimated value from a critic given environment observations and actions (since r2020a)
obtain maximum estimated value over all possible actions from a q-value function critic with discrete action space, given environment observations (since r2020a)
evaluateevaluate function approximator object given observation (or observation-action) input data (since r2022a)
evaluate gradient of function approximator object given observation and action input data (since r2022a)
accelerateoption to accelerate computation of gradient for approximator object based on neural network (since r2022a)
quadratic layer for actor or critic network (since r2019a)
scaling layer for actor or critic network (since r2019a)
softplus layer for actor or critic network (since r2020a)
feature input layer (since r2020b)
rectified linear unit (relu) layer
hyperbolic tangent (tanh) layer (since r2019a)
fullyconnectedlayerfully connected layer
long short-term memory (lstm) layer for recurrent neural network (rnn)
softmax layer

topics

  • create policies and value functions

    specify policies and value functions using function approximators, such as deep neural networks.


  • you can import existing policies from other deep learning frameworks using the onnx™ model format.

网站地图