policies and value functions -凯发k8网页登录

define policy and value function approximators, such as actors and critics

during training, most agents rely on an actor, a critic, or both. the actor learns the policy that selects the action to take. the critic learns the value (or q-value) function that estimates the value of a policy.

reinforcement learning toolbox™ provides function approximator objects for actors and critics, and policy objects for custom loops and deployment. approximator objects can internally use different approximation models, such as deep neural networks, linear basis functions, or look-up tables.

for an introduction to policies, value functions, actors and critics, see create policies and value functions.

blocks

policy

reinforcement learning policy (since r2022b)

functions

create actors and critics

	value table or q table (since r2019a)
	value function approximator object for reinforcement learning agents (since r2022a)
`rlqvaluefunction`	q-value function approximator object for reinforcement learning agents (since r2022a)
	vector q-value function approximator for reinforcement learning agents (since r2022a)
`rlcontinuousdeterministicactor`	deterministic actor with a continuous action space for reinforcement learning agents (since r2022a)
	stochastic categorical actor with a discrete action space for reinforcement learning agents (since r2022a)
	stochastic gaussian actor with a continuous action space for reinforcement learning agents (since r2022a)

get and set actors and critics from and to agents

	extract actor from reinforcement learning agent (since r2019a)
	set actor of reinforcement learning agent (since r2019a)
	extract critic from reinforcement learning agent (since r2019a)
	set critic of reinforcement learning agent (since r2019a)

get and set approximation models and learnable parameters

	get approximation model from function approximator object (since r2020b)
	set approximation model in function approximator object (since r2020b)
	obtain learnable parameter values from agent, function approximator, or policy object (since r2019a)
	set learnable parameter values of agent, function approximator, or policy object (since r2019a)

training options for actors and critics

rloptimizeroptions optimization options for actors and critics (since r2022a)

extract policy objects from agents

	extract greedy (deterministic) policy object from agent (since r2022a)
	extract exploratory (stochastic) policy object from agent (since r2023a)

create policy objects for custom training and deployment

	policy object to generate discrete max-q actions for custom training loops and application deployment (since r2022a)
	policy object to generate discrete epsilon-greedy actions for custom training loops (since r2022a)
	policy object to generate continuous deterministic actions for custom training loops and application deployment (since r2022a)
	policy object to generate continuous noisy actions for custom training loops (since r2022a)
	policy object to generate stochastic actions for custom training loops and application deployment (since r2022a)

get actions and values

	obtain action from agent, actor, or policy object given environment observations (since r2020a)
	obtain estimated value from a critic given environment observations and actions (since r2020a)
	obtain maximum estimated value over all possible actions from a q-value function critic with discrete action space, given environment observations (since r2020a)
`evaluate`	evaluate function approximator object given observation (or observation-action) input data (since r2022a)
	evaluate gradient of function approximator object given observation and action input data (since r2022a)
`accelerate`	option to accelerate computation of gradient for approximator object based on neural network (since r2022a)

deep neural network layers

	quadratic layer for actor or critic network (since r2019a)
	scaling layer for actor or critic network (since r2019a)
	softplus layer for actor or critic network (since r2020a)
	feature input layer (since r2020b)
	rectified linear unit (relu) layer
	hyperbolic tangent (tanh) layer (since r2019a)
`fullyconnectedlayer`	fully connected layer
	long short-term memory (lstm) layer for recurrent neural network (rnn)
	softmax layer

topics

create policies and value functions
specify policies and value functions using function approximators, such as deep neural networks.
you can import existing policies from other deep learning frameworks using the onnx™ model format.