policies and value functions -凯发k8网页登录
during training, most agents rely on an actor, a critic, or both. the actor learns the policy that selects the action to take. the critic learns the value (or q-value) function that estimates the value of a policy.
reinforcement learning toolbox™ provides function approximator objects for actors and critics, and policy objects for custom loops and deployment. approximator objects can internally use different approximation models, such as deep neural networks, linear basis functions, or look-up tables.
for an introduction to policies, value functions, actors and critics, see create policies and value functions.
blocks
policy | reinforcement learning policy (since r2022b) |
functions
topics
- create policies and value functions
specify policies and value functions using function approximators, such as deep neural networks.
you can import existing policies from other deep learning frameworks using the onnx™ model format.