policies and value functions -凯发k8网页登录

define policy and value function approximators, such as actors and critics

a reinforcement learning policy is a mapping from the current environment observation to a probability distribution of the actions to be taken. a value function is a mapping from an environment observation (or observation-action pair) to the value (the expected cumulative long-term reward) of a policy. during training, the agent tunes the parameters of its policy and value function approximators to maximize the long-term reward.

reinforcement learning toolbox™ software provides approximator objects for actors and critics. the actor learns the policy that selects the best action to take. the critic learns the value (or q-value) function that estimates the value of the current policy. depending on your application and selected agent, you can define policy and value function approximator using different approximation models, such as deep neural networks, linear basis functions, or look-up tables. for more information, see .

blocks

policy

reinforcement learning policy

functions

create actors and critics

	value table or q table
	value function approximator object for reinforcement learning agents
`rlqvaluefunction`	q-value function approximator object for reinforcement learning agents
	vector q-value function approximator for reinforcement learning agents
`rlcontinuousdeterministicactor`	deterministic actor with a continuous action space for reinforcement learning agents
	stochastic categorical actor with a discrete action space for reinforcement learning agents
	stochastic gaussian actor with a continuous action space for reinforcement learning agents

get and set actors and critics from and to agents

	extract actor from reinforcement learning agent
	set actor of reinforcement learning agent
	extract critic from reinforcement learning agent
	set critic of reinforcement learning agent

get and set approximation models and learnable parameters

	get function approximator model from actor or critic
	set function approximation model for actor or critic
	obtain learnable parameter values from agent, function approximator, or policy object
	set learnable parameter values of agent, function approximator, or policy object

training options for actors and critics

rloptimizeroptions optimization options for actors and critics

extract policy objects from agents

	extract greedy (deterministic) policy object from agent
	extract exploratory (stochastic) policy object from agent

create policy objects for custom training and deployment

	policy object to generate discrete max-q actions for custom training loops and application deployment
	policy object to generate discrete epsilon-greedy actions for custom training loops
	policy object to generate continuous deterministic actions for custom training loops and application deployment
	policy object to generate continuous noisy actions for custom training loops
	policy object to generate stochastic actions for custom training loops and application deployment

get actions and values

	obtain action from agent, actor, or policy object given environment observations
	obtain estimated value from a critic given environment observations and actions
	obtain maximum estimated value over all possible actions from a q-value function critic with discrete action space, given environment observations
`evaluate`	evaluate function approximator object given observation (or observation-action) input data
	evaluate gradient of function approximator object given observation and action input data
`accelerate`	option to accelerate computation of gradient for approximator object based on neural network

deep neural network layers

	quadratic layer for actor or critic network
	scaling layer for actor or critic network
	softplus layer for actor or critic network
	feature input layer
	rectified linear unit (relu) layer
	hyperbolic tangent (tanh) layer
`fullyconnectedlayer`	fully connected layer
	long short-term memory (lstm) layer for recurrent neural network (rnn)
	softmax layer

topics

specify policies and value functions using function approximators, such as deep neural networks.
you can import existing policies from other deep learning frameworks using the onnx™ model format.