main content

is-凯发k8网页登录

is-done function approximator object for neural network-based environment

since r2022a

description

when creating a neural network-based environment using rlneuralnetworkenvironment, you can specify the is-done function approximator using an rlisdonefunction object. do so when you do not know a ground-truth termination signal for your environment.

the is-done function approximator object uses a deep neural network as internal approximation model to predict the termination signal for the environment given one of the following input combinations.

  • observations, actions, and next observations

  • observations and actions

  • actions and next observations

  • next observations

creation

description

example

isdfcnappx = rlisdonefunction(net,observationinfo,actioninfo,name=value) creates the is-done function approximator object isdfcnappx using the deep neural network net and sets the observationinfo and actioninfo properties.

when creating an is-done function approximator you must specify the names of the deep neural network inputs using one of the following combinations of name-value pair arguments.

you can also specify the usedeterministicpredict and usedevice properties using optional name-value pair arguments. for example, to use a gpu for prediction, specify usedevice="gpu".

input arguments

deep neural network with a scalar output value, specified as a dlnetwork object.

the input layer names for this network must match the input names specified using the observationinputnames, actioninputnames, and nextobservationinputnames. the dimensions of the input layers must match the dimensions of the corresponding observation and action specifications in observationinfo and actioninfo, respectively.

name-value arguments

specify optional pairs of arguments as name1=value1,...,namen=valuen, where name is the argument name and value is the corresponding value. name-value arguments must appear after other arguments, but the order of the pairs does not matter.

example: observationinputnames="velocity"

observation input layer names, specified as a string or string array. specify observationinputnames when you expect the termination signal to depend on the current environment observation.

the number of observation input names must match the length of observationinfo and the order of the names must match the order of the specifications in observationinfo.

action input layer names, specified as a string or string array. specify actioninputnames when you expect the termination signal to depend on the current action value.

the number of action input names must match the length of actioninfo and the order of the names must match the order of the specifications in actioninfo.

next observation input layer names, specified as a string or string array. specify nextobservationinputnames when you expect the termination signal to depend on the next environment observation.

the number of next observation input names must match the length of observationinfo and the order of the names must match the order of the specifications in observationinfo.

properties

this property is read-only.

observation specifications, specified as an rlnumericspec object or an array of such objects. each element in the array defines the properties of an environment observation channel, such as its dimensions, data type, and name.

you can extract the observation specifications from an existing environment or agent using getobservationinfo. you can also construct the specifications manually using rlnumericspec.

this property is read-only.

action specifications, specified as an rlfinitesetspec or rlnumericspec object. this object defines the properties of the environment action channel, such as its dimensions, data type, and name.

note

only one action channel is allowed.

you can extract the action specifications from an existing environment or agent using getactioninfo. you can also construct the specification manually using rlfinitesetspec or rlnumericspec.

option to predict the terminal signal deterministically, specified as one of the following values.

  • true — use deterministic network prediction.

  • false — use stochastic network prediction.

computation device used to perform operations such as gradient computation, parameter updates, and prediction during training and simulation, specified as either "cpu" or "gpu".

the "gpu" option requires both parallel computing toolbox™ software and a cuda®-enabled nvidia® gpu. for more information on supported gpus see gpu computing requirements (parallel computing toolbox).

you can use gpudevice (parallel computing toolbox) to query or select a local gpu device to be used with matlab®.

note

training or simulating a network on a gpu involves device-specific numerical round-off errors. these errors can produce different results compared to performing the same operations using a cpu.

object functions

rlneuralnetworkenvironmentenvironment model with deep neural network transition models

examples

create an environment interface and extract observation and action specifications. alternatively, you can create specifications using rlnumericspec and rlfinitesetspec.

env = rlpredefinedenv("cartpole-continuous");
obsinfo = getobservationinfo(env);
actinfo = getactioninfo(env);

to approximate the is-done function, use a deep neural network. the network has one input channel for the next observations. the single output channel is for the predicted termination signal.

create the neural network as a vector of layer objects.

commonpath = [
    featureinputlayer( ...
                obsinfo.dimension(1), ...
                name="nextstate")
    fullyconnectedlayer(64)
    relulayer
    fullyconnectedlayer(64)
    relulayer
    fullyconnectedlayer(2)
    softmaxlayer(name="isdone")];
net = layergraph(commonpath);
plot(net)

figure contains an axes object. the axes object contains an object of type graphplot.

covert the network to a dlnetwork object and display the number of weights.

net = dlnetwork(net);
summary(net);
   initialized: true
   number of learnables: 4.6k
   inputs:
      1   'nextstate'   4 features

create an is-done function approximator object.

isdonefcnappx = rlisdonefunction(...
    net,obsinfo,actinfo,...
    nextobservationinputnames="nextstate");

using this is-done function approximator object, you can predict the termination signal based on the next observation. for example, predict the termination signal for a random next observation. since for this example the termination signal only depends on the next observation, use empty cell arrays for the current action and observation inputs.

nxtobs = rand(obsinfo.dimension);
predisdone = predict(isdonefcnappx,{},{},{nxtobs})
predisdone = 0

you can obtain the termination probability using evaluate.

predisdoneprob = evaluate(isdonefcnappx,{nxtobs})
predisdoneprob = 1x1 cell array
    {2x1 single}
predisdoneprob{1}
ans = 2x1 single column vector
    0.5405
    0.4595

the first number is the probability of obtaining a 0 (no termination predicted), the second one is the probability of obtaining a 1 (termination predicted).

version history

introduced in r2022a

网站地图