q-凯发k8网页登录
q-value function approximator object for reinforcement learning agents
since r2022a
description
this object implements a q-value function approximator that you can use as a
critic for a reinforcement learning agent. a q-value function (also known as action-value
function) is a mapping from an environment observation-action pair to the value of a policy.
specifically, its output is a scalar that represents the expected discounted cumulative
long-term reward when an agent starts from the state corresponding to the given observation,
executes the given action, and keeps on taking actions according to the given policy
afterwards. a q-value function critic therefore needs both the environment state and an action
as inputs. after you create an rlqvaluefunction
critic, use it to create an
agent such as , , , , or rltd3agent
. for more
information on creating representations, see create policies and value functions.
creation
syntax
description
creates the q-value function object critic
= rlqvaluefunction(net
,observationinfo
,actioninfo
)critic
. here,
net
is the deep neural network used as an approximation model,
and it must have both observation and action as inputs and a single scalar output. the
network input layers are automatically associated with the environment observation and
action channels according to the dimension specifications in
observationinfo
and actioninfo
. this
function sets the observationinfo
and
actioninfo
properties of critic
to the
observationinfo
and actioninfo
input
arguments, respectively.
creates the q-value function object critic
= rlqvaluefunction(tab
,observationinfo
,actioninfo
)critic
with discrete
action and observation spaces from the q-value table
tab
. tab
is a object
containing a table with as many rows as the number of possible observations and as many
columns as the number of possible actions. the function sets the
and
properties of critic
respectively to the
observationinfo
and actioninfo
input
arguments, which in this case must be scalar rlfinitesetspec
objects.
creates a q-value function object critic
= rlqvaluefunction({basisfcn
,w0
},observationinfo
,actioninfo
)critic
using a custom basis
function as underlying approximator. the first input argument is a two-element cell
array whose first element is the handle basisfcn
to a custom basis
function and whose second element is the initial weight vector w0
.
here the basis function must have both observation and action as inputs and
w0
must be a column vector. the function sets the
observationinfo
and actioninfo
properties of
critic
to the observationinfo
and
actioninfo
input arguments, respectively.
specifies one or more name-value arguments. you can specify the input and output layer
names (to mandate their association with the environment observation and action
channels) for deep neural network approximators. for all types of approximators, you can
specify the computation device, for example critic
= rlqvaluefunction(___,name=value
)usedevice="gpu"
.
input arguments
properties
object functions
deep deterministic policy gradient (ddpg) reinforcement learning agent | |
rltd3agent | twin-delayed deep deterministic (td3) policy gradient reinforcement learning agent |
deep q-network (dqn) reinforcement learning agent | |
q-learning reinforcement learning agent | |
sarsa reinforcement learning agent | |
soft actor-critic (sac) reinforcement learning agent | |
obtain estimated value from a critic given environment observations and actions | |
obtain maximum estimated value over all possible actions from a q-value function critic with discrete action space, given environment observations | |
evaluate | evaluate function approximator object given observation (or observation-action) input data |
evaluate gradient of function approximator object given observation and action input data | |
accelerate | option to accelerate computation of gradient for approximator object based on neural network |
obtain learnable parameter values from agent, function approximator, or policy object | |
set learnable parameter values of agent, function approximator, or policy object | |
set approximation model in function approximator object | |
get approximation model from function approximator object |
examples
version history
introduced in r2022a
see also
functions
- | | |
evaluate
|getobservationinfo
|getactioninfo
objects
rlnumericspec
|rlfinitesetspec
| | | | | | | |rltd3agent