deep learning processor ip core architecture
deep learning hdl toolbox™ provides a target-independent generic deep learning processor ip core that you can deploy to any custom platform. you can reuse the deep learning processor ip core and share it to accommodate deep neural networks that have various layer sizes and parameters. use this deep learning processor ip core to rapidly prototype deep neural networks from matlab® and deploy the network to fpgas.
this image shows the deep learning processor ip core architecture:
to illustrate the deep learning processor ip core architecture, consider an image classification example.
ddr memory
you can store the input images, weights, and output images in the external ddr memory. the processor consists of three axi4 master interfaces that communicate with the external memory. you can use one of the axi4 master interfaces to load the input images onto the processing modules. the method generates the weight data. to retrieve the activation data from the ddr , see . you can write the weight data to a deployment file and use the deployment file to initialize the generated deep learning processor. for more information, see .
memory access arbitrator modules
the activation and weight memory access arbitrator modules use axi master interface to read and write weights and activation data to and from the processing modules. the profiler axi master interface reads and writes profiler timing data and instructions to the profiler module.
convolution kernel
the conv kernel
implements layers that have a convolution layer
output format. the two axi4 master interfaces provide the weights and activations for the
layer to the conv kernel
. the conv kernel
then
performs the implemented layer operation on the input image. this kernel is generic because
it can support tensors and shapes of various sizes. for a list of layers with the
conv
output format, see supported layers. for a list of the
conv kernel
properties, see .
top-level scheduler module
the top-level scheduler module schedules what instructions to run, what data to read from ddr, and when to read the data from ddr. the scheduler module acts as the central computer in a distributed computer architecture that distributes instructions to the processing modules. for example, if the network has a convolution layer, fully connected layer, and a multiplication layer the scheduler:
schedules the processing and data read instructions for the convolution layer and sends them to the
conv
kernel.schedules the processing and data read instructions for the fully connected layer and sends them to the
fc
kernel.schedules the processing and data read instructions for the multiplication layer and sends them to the
custom
kernel.
fully connected kernel
the fully connected (fc) kernel implements layers that have a fully connected layer
output format. the two axi4 master interfaces provide the weights and activations to the
fc kernel
. the fc kernel
then performs the
fully-connected layer operation on the input image. this kernel is also generic because it
can support tensors and shapes of various sizes. for a list of layers with fc output format,
see supported layers. for a list of the
fc kernel
properties, see .
custom kernel
the custom kernel module implements layers that are registered as a custom layer by
using the method. to learn how to create, register, and validate your
own custom layers, see . for example, the addition
layer, multiplication layer, resize2dlayer
, and so on are implemented on
the custom kernel module. for a list of layers implemented on this module, see supported layers. for a list of the
custom kernel
properties, see .
profiler utilities
when you set the profiler
argument of the predict
or predictandupdatestate
methods to on
, the profiler
module collects information from the kernel, such as the conv kernel
start and stop times, fc kernel
start and stop times, and so on. the
profiler module uses this information to create a profiler table with these results. for
more information, see profile inference run.
see also
|