main content

policies and value functions -凯发k8网页登录

define policy and value function approximators, such as actors and critics

a reinforcement learning policy is a mapping from the current environment observation to a probability distribution of the actions to be taken. a value function is a mapping from an environment observation (or observation-action pair) to the value (the expected cumulative long-term reward) of a policy. during training, the agent tunes the parameters of its policy and value function approximators to maximize the long-term reward.

reinforcement learning toolbox™ software provides approximator objects for actors and critics. the actor learns the policy that selects the best action to take. the critic learns the value (or q-value) function that estimates the value of the current policy. depending on your application and selected agent, you can define policy and value function approximator using different approximation models, such as deep neural networks, linear basis functions, or look-up tables. for more information, see .

blocks

policyreinforcement learning policy

functions

value table or q table
value function approximator object for reinforcement learning agents
rlqvaluefunction q-value function approximator object for reinforcement learning agents
vector q-value function approximator for reinforcement learning agents
rlcontinuousdeterministicactor deterministic actor with a continuous action space for reinforcement learning agents
stochastic categorical actor with a discrete action space for reinforcement learning agents
stochastic gaussian actor with a continuous action space for reinforcement learning agents
extract actor from reinforcement learning agent
set actor of reinforcement learning agent
extract critic from reinforcement learning agent
set critic of reinforcement learning agent
get function approximator model from actor or critic
set function approximation model for actor or critic
obtain learnable parameter values from agent, function approximator, or policy object
set learnable parameter values of agent, function approximator, or policy object
rloptimizeroptionsoptimization options for actors and critics
extract greedy (deterministic) policy object from agent
extract exploratory (stochastic) policy object from agent
policy object to generate discrete max-q actions for custom training loops and application deployment
policy object to generate discrete epsilon-greedy actions for custom training loops
policy object to generate continuous deterministic actions for custom training loops and application deployment
policy object to generate continuous noisy actions for custom training loops
policy object to generate stochastic actions for custom training loops and application deployment
obtain action from agent, actor, or policy object given environment observations
obtain estimated value from a critic given environment observations and actions
obtain maximum estimated value over all possible actions from a q-value function critic with discrete action space, given environment observations
evaluateevaluate function approximator object given observation (or observation-action) input data
evaluate gradient of function approximator object given observation and action input data
accelerateoption to accelerate computation of gradient for approximator object based on neural network
quadratic layer for actor or critic network
scaling layer for actor or critic network
softplus layer for actor or critic network
feature input layer
rectified linear unit (relu) layer
hyperbolic tangent (tanh) layer
fullyconnectedlayerfully connected layer
long short-term memory (lstm) layer for recurrent neural network (rnn)
softmax layer

topics


  • specify policies and value functions using function approximators, such as deep neural networks.


  • you can import existing policies from other deep learning frameworks using the onnx™ model format.

网站地图