supported networks, layers, boards, and tools
supported pretrained networks
deep learning hdl toolbox™ supports code generation for series convolutional neural networks (cnns or convnets). you can generate code for any trained cnn whose computational layers are supported for code generation. for a full list, see supported layers. you can use one of the pretrained networks listed in the table to generate code for your target intel® or xilinx® fpga boards.
network | network description | type | single data type (with shipping bitstreams) | int8 data type (with shipping bitstreams) | application area | ||||
zcu102 | zc706 | arria10 soc | zcu102 | zc706 | arria10 soc | classification | |||
alexnet | alexnet convolutional neural network. | series network | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | classification |
logonet | logo recognition network (logonet) is a matlab® developed logo identification network. for more information, see . | series network | yes | yes | yes | yes | yes | yes | classification |
digitsnet | digit classification network. see | series network | yes | yes | yes | yes | yes | yes | classification |
lane detection | lanenet convolutional neural network. for more information, see . | series network | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | classification |
vgg-16 | vgg-16 convolutional neural network. for the pretrained vgg-16 model, see . | series network | no. network exceeds pl ddr memory size | no. network exceeds fc module memory size. | yes | yes | no. network exceeds fc module memory size. | yes | classification |
vgg-19 | vgg-19 convolutional neural network. for the pretrained vgg-19 model, see . | series network | no. network exceeds pl ddr memory size | no. network exceeds fc module memory size. | yes | yes | no. network exceeds fc module memory size. | yes | classification |
darknet-19 | darknet-19 convolutional neural network. for the pretrained darknet-19 model, see . | series network | yes | yes | yes | yes | yes | yes | classification |
radar classification | convolutional neural network that uses micro-doppler signatures to identify and classify the object. for more information, see . | series network | yes | yes | yes | yes | yes | yes | classification and software defined radio (sdr) |
defect detection snet_defnet | snet_defnet is a custom alexnet network used to identify and
classify defects. for more information, see defect detection. | series network | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | classification |
defect detection snet_blemdetnet | snet_blemdetnet is a custom convolutional neural network
used to identify and classify defects. for more information, see defect detection. | series network | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | classification |
darknet-53 | darknet-53 convolutional neural network. for the pretrained darknet-53 model, see . | directed acyclic graph (dag) network based | yes | yes | yes | yes | yes | no | classification |
resnet-18 | resnet-18 convolutional neural network. for the pretrained resnet-18 model, see . | directed acyclic graph (dag) network based | yes | yes | yes | yes | yes | yes | classification |
resnet-50 | resnet-50 convolutional neural network. for the pretrained resnet-50 model, see . | directed acyclic graph (dag) network based | no. network exceeds pl ddr memory size. | no. network exceeds pl ddr memory size. | yes | yes | yes | yes | classification |
resnet-based yolo v2 | you only look once (yolo) is an object detector that decodes the predictions from a convolutional neural network and generates bounding boxes around the objects. for more information, see . | directed acyclic graph (dag) network based | yes | yes | yes | yes | yes | yes | object detection |
mobilenetv2 | mobilenet-v2 convolutional neural network. for the pretrained mobilenet-v2 model, see . | directed acyclic graph (dag) network based | yes | yes | yes | yes | yes | yes | classification |
googlenet | googlenet convolutional neural network. for the pretrained googlenet model, see . | directed acyclic graph (dag) network based | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | no. to use the bitstream, enable the lrnblockgeneration
property of the processor configuration for the bitstream and generate the bitstream
again. | classification |
posenet | human pose estimation network. | directed acyclic graph (dag) network based | yes. | yes | yes | yes | yes | yes | segmentation |
u-net | u-net convolutional neural network designed for semantic image segmentation. | directed acyclic graph (dag) network based | no. pl ddr memory oversize. | no. pl ddr memory oversize. | no. pl ddr memory oversize. | no. pl ddr memory oversize. | no. pl ddr memory oversize. | yes | segmentation |
squeezenet-based yolo v3 | the you-only-look-once (yolo) v3 object detector is a multi-scale object detection network that uses a feature extraction network and multiple detection heads to make predictions at multiple scales. | dlnetwork object | yes | yes | no | no | no | no | object detection |
sequence-to-sequence classification | classify each time step of sequence data using a long short-term memory (lstm) network. see . | long short-term memory (lstm) network | yes | yes | no | no | no | no | sequence data classification |
time series forecasting | forecast time series data using a long short-term memory (lstm) network. see | long short-term memory (lstm) network | yes | yes | no | no | no | no | forecast time series data |
word-by-word text generation | generate text word-by-word by using a long short-term memory (lstm) network. see . | long short-term memory (lstm) network | yes | yes | no | no | no | no | sequence data prediction |
yamnet | pretrained audio classification network. see yamnet (audio toolbox) and . | series network | yes | yes | yes | yes | yes | yes | audio data classification |
semantic segmentation using dilated convolutions | semantic segmentation using dilated convolution layer to increase coverage area without increasing the number of computational parameters. see . | series network | yes | yes | yes | yes | yes | yes | segmentation |
time series forecasting | forecast time series data using a long short-term memory (lstm) network. see . | gated recurrent unit (gru) layer network | yes | yes | no | no | no | no | forecast time series data |
pruned image classification network | pruned image classification network. see | series network | yes | yes | yes | yes | yes | yes | image classification |
very-deep super-resolution (vdsr) network | create high resolution images from low-resolution images by using vdsr networks. see | series network | yes | yes | yes | yes | yes | yes | image processing |
supported layers
deep learning hdl toolbox supports the layers listed in these tables.
input layers
layer | layer type hardware (hw) or software(sw) | description and limitations | int8 compatible |
sw | an image input layer inputs 2-d images to a network and applies data
normalization. the normalization options | yes. runs as single datatype in sw. | |
sw | a feature input layer inputs feature data to a network and applies data normalization. | no | |
sw | a sequence input layer inputs sequence data to a network. | no |
convolution and fully connected layers
layer | layer type hardware (hw) or software(sw) | layer output format | description and limitations | int8 compatible |
hw | convolution (conv) | a 2-d convolutional layer applies sliding convolutional filters to the input. when generating code for a network using this layer, these limitations apply:
| yes | |
hw | convolution (conv) | a 2-d grouped convolutional layer separates the input channels into groups and applies sliding convolutional filters. use grouped convolutional layers for channel-wise separable (also known as depth-wise separable) convolution. code generation is now supported for a 2-d grouped
convolution layer that has the when generating code for a network using this layer, these limitations apply:
| yes | |
hw | convolution (conv) | a transposed 2-d convolution layer upsamples feature maps. when generating code for a network using this layer, these limitations apply:
| yes | |
hw | fully connected (fc) | a fully connected layer multiplies the input by a weight matrix, and then adds a bias vector. when generating code for a network using this layer, these limitations apply:
| yes |
activation layers
layer | layer type hardware (hw) or software(sw) | layer output format | description and limitations | int8 compatible |
hw | layer is fused. | a relu layer performs a threshold operation to each element of the input where any value less than zero is set to zero. a relu layer is supported only when it is preceded by any of these layers:
| yes | |
hw | layer is fused. | a leaky relu layer performs a threshold operation where any input value less than zero is multiplied by a fixed scalar. a leaky relu layer is supported only when it is preceded by any of these layers:
| yes | |
hw | layer is fused. | a clipped relu layer performs a threshold operation where any input value less than zero is set to zero and any value above the clipping ceiling is set to that clipping ceiling value. a clipped relu layer is supported only when it is preceded by any of these layers:
| yes | |
hw | inherit from input | a hyperbolic tangent (tanh) activation layer applies the tanh function on the layer inputs. | no |
normalization, dropout, and cropping layers
layer | layer type hardware (hw) or software(sw) | layer output format | description and limitations | int8 compatible |
hw | layer is fused. | a batch normalization layer normalizes each input channel across a mini-batch. a batch normalization layer is supported when preceded by an image input layer or convolution layer. | yes | |
hw | convolution (conv) | a channel-wise local response (cross-channel) normalization layer carries out channel-wise normalization. the
| yes. runs as single datatype in hw. | |
noop on inference | noop on inference | a dropout layer randomly sets input elements to zero within a given probability. | yes | |
(image processing toolbox) | hw | inherit from input | a 2-d resize layer resizes 2-d input by a scale factor, to a specified height and width, or to the size of a reference input feature map. when generating code for a network using this layer, these limitations apply:
| no |
pooling and unpooling layers
layer | layer type hardware (hw) or software(sw) | layer output format | description and limitations | int8 compatible |
hw | convolution (conv) | a max pooling layer performs downsampling by dividing the layer input into rectangular pooling regions and computing the maximum of each region. when generating code for a network using this layer, these limitations apply:
| yes no, when | |
hw | convolution (conv) | a max unpooling layer unpools the output of a max pooling layer. | no | |
hw | convolution (conv) | an average pooling layer performs downsampling by dividing the layer input into rectangular pooling regions and computing the average values of each region. when generating code for a network using this layer, these limitations apply:
| yes | |
hw | convolution (conv) | a global average pooling layer performs downsampling by computing the mean of the height and width dimensions of the input. when generating code for a network using this layer, these limitations apply:
| yes |
combination layers
layer | layer type hardware (hw) or software(sw) | layer output format | description and limitations | int8 compatible |
hw | inherit from input. | an addition layer adds inputs from multiple neural network layers element-wise. you can now generated code for this layer with
when generating code for a network using this layer, these limitations apply:
| yes | |
hw | inherit from input. | a depth concatenation layer takes inputs that have the same height and width and concatenates them along the third dimension (the channel dimension). when generating code for a network using this layer, these limitations apply:
| yes | |
hw | inherit from input | a multiplication layer multiplies inputs from multiple neural network layers element-wise. | no |
sequence layers
layer | layer type hardware (hw) or software(sw) | description and limitations | int8 compatible |
---|---|---|---|
hw | an lstm layer learns long-term dependencies between time steps in time series and sequence data. the layer performs additive interactions, which can help improve gradient flow over long sequences during training. when generating code for a network using this layer, these limitations apply:
| no | |
hw | a gru layer is an rnn layer that learns dependencies between time steps in time series and sequence data. when generating code for a network using this layer, these limitations apply:
| no |
output layer
layer | layer type hardware (hw) or software(sw) | description and limitations | int8 compatible |
sw and hw | a softmax layer applies a softmax function to the input. if the softmax layer is implemented in hardware:
| yes. runs as single datatype in sw. | |
sw | a classification layer computes the cross-entropy loss for multiclass classification issues that have mutually exclusive classes. | yes | |
sw | a regression layer computes the half mean squared error loss for regression problems. | yes | |
sw and hw | a sigmoid layer applies a sigmoid function to the input. when the data type is
runs as single datatype in sw. | yes. when the data type is
|
keras and onnx layers
layer | layer type hardware (hw) or software(sw) | layer output format | description and limitations | int8 compatible |
nnet.keras.layer.flattencstylelayer | hw | layer will be fused | flatten activations into 1-d layers assuming c-style (row-major) order. a | yes |
nnet.keras.layer.zeropadding2dlayer | hw | layer will be fused. | zero padding layer for 2-d input. a
| yes |
custom layers
layer | layer type hardware (hw) or software(sw) | layer output format | description and limitations | int8 compatible |
---|---|---|---|---|
custom layers | hw | inherit from input | custom layers, with or without learnable parameters, that you define for your problem. to learn how to define your custom deep learning layers, see . | no |
supported boards
these boards are supported by deep learning hdl toolbox:
xilinx zynq®-7000 zc706
intel arria® 10 soc
xilinx zynq ultrascale ™ mpsoc zcu102
custom boards. for more information, see .
third-party synthesis tools and version support
deep learning hdl toolbox has been tested with:
xilinx vivado® design suite 2022.1
intel quartus® prime standard 21.1
image input layer normalization hardware implementation
to enable hardware implementation of the normalization functions for the image input
layer, set the hardwarenormalization
argument of the
compile
method to auto
or on
. when
hardwarenormalization
is set to auto
, the compile
method looks for the presence of addition and multiplication layers to implement the
normalization function on hardware. the normalization is implemented on hardware by:
creating a new constant layer, this layer holds the value which is to be subtracted.
using existing addition and multiplication layers. the layers to be used depends on the normalization function being implemented.
constant layer buffer content
this table describes the value stored in the constant layer buffer.
normalization function | number of constants | constant layer buffer value |
---|---|---|
zerocenter | 1 | - mean |
zscore | 2 | the first constant value is -mean . the second constant
value is 1/standarddeviation |