profile inference run
this example shows how to retrieve the prediction and profiler results for the resnet-18 network. view the network prediction and performance data for the layers, convolution module and fully connected modules in your pretrained deep learning network.
create an object of class
workflow
by using thedlhdl.workflow
class.see, .
set a pretrained deep learning network and bitstream for the workflow object.
see, .
create an object of class
dlhdl.target
and specify the target vendor and interface. see, .to deploy the network on a specified target fpga board, call the
deploy
method for the workflow object. see, .call the
predict
function for the workflow object. provide an array of images as theinputimage
parameter. provide arguments to turn on the profiler. see classify images on fpga using quantized neural network.the labels classifying the images are stored in a structure
struct
and displayed on the screen. the performance parameters of speed and latency are returned in a structurestruct
.
use this image to run this code:
snet = resnet18; ht = dlhdl.target('xilinx','interface','ethernet'); hw = dlhdl.workflow('net',snet,'bitstream','zcu102_single','target',ht); hw.deploy; image = imread('zebra.jpeg'); inputimg = imresize(image, [224, 224]); imshow(inputimg); [prediction, speed] = hw.predict(single(inputimg),'profile','on'); [val, idx] = max(prediction); snet.layers(end).classnames{idx}
### finished writing input activations. ### running single input activations. deep learning processor profiler performance results lastframelatency(cycles) lastframelatency(seconds) framesnum total latency frames/s ------------- ------------- --------- --------- --------- network 23659630 0.10754 1 23659630 9.3 conv1 2224115 0.01011 pool1 572867 0.00260 res2a_branch2a 972699 0.00442 res2a_branch2b 972568 0.00442 res2a 209312 0.00095 res2b_branch2a 972733 0.00442 res2b_branch2b 973022 0.00442 res2b 209736 0.00095 res3a_branch2a 747507 0.00340 res3a_branch2b 904291 0.00411 res3a_branch1 538763 0.00245 res3a 104750 0.00048 res3b_branch2a 904389 0.00411 res3b_branch2b 904367 0.00411 res3b 104886 0.00048 res4a_branch2a 485682 0.00221 res4a_branch2b 880001 0.00400 res4a_branch1 486429 0.00221 res4a 52628 0.00024 res4b_branch2a 880053 0.00400 res4b_branch2b 880035 0.00400 res4b 52478 0.00024 res5a_branch2a 1056299 0.00480 res5a_branch2b 2056857 0.00935 res5a_branch1 1056510 0.00480 res5a 26170 0.00012 res5b_branch2a 2057203 0.00935 res5b_branch2b 2057659 0.00935 res5b 26381 0.00012 pool5 71405 0.00032 fc1000 216155 0.00098 * the clock frequency of the dl processor is: 220mhz
the profiler data returns these parameters and their values:
lastframelatency(cycles)
— total number of clock cycles for previous frame execution.clock frequency — clock frequency information is retrieved from the bitstream that was used to deploy the network to the target board. for example, the profiler returns
* the clock frequency of the dl processor is: 220mhz
. the clock frequency of 220 mhz is retrieved from thezcu102_single
bitstream.lastframelatency(seconds)
— total number of seconds for previous frame execution. the total time is calculated aslastframelatency(cycles)/clock frequency
. for example theconv_module
lastframelatency(seconds)
is calculated as2224115/(220*10^6)
.framesnum
— total number of input frames to the network. this value will be used in the calculation offrames/s
.total latency
— total number of clock cycles to execute all the network layers and modules forframesnum
.frames/s
— number of frames processed in one second by the network. the totalframes/s
is calculated as(framesnum*clock frequency)/total latency
. for example theframes/s
in the example is calculated as(1*220*10^6)/23659630
.
see also
| | predict