policies and value functions -凯发k8网页登录
a reinforcement learning policy is a mapping from the current environment observation to a probability distribution of the actions to be taken. a value function is a mapping from an environment observation (or observation-action pair) to the value (the expected cumulative long-term reward) of a policy. during training, the agent tunes the parameters of its policy and value function approximators to maximize the long-term reward.
reinforcement learning toolbox™ software provides approximator objects for actors and critics. the actor learns the policy that selects the best action to take. the critic learns the value (or q-value) function that estimates the value of the current policy. depending on your application and selected agent, you can define policy and value function approximator using different approximation models, such as deep neural networks, linear basis functions, or look-up tables. for more information, see .
blocks
policy | reinforcement learning policy |
functions
topics
specify policies and value functions using function approximators, such as deep neural networks.
you can import existing policies from other deep learning frameworks using the onnx™ model format.