profile inference run -凯发k8网页登录

main content

profile inference run

this example shows how to retrieve the prediction and profiler results for the resnet-18 network. view the network prediction and performance data for the layers, convolution module and fully connected modules in your pretrained deep learning network.

  1. create an object of class workflow by using the dlhdl.workflow class.

    see, .

  2. set a pretrained deep learning network and bitstream for the workflow object.

    see, .

  3. create an object of class dlhdl.target and specify the target vendor and interface. see, .

  4. to deploy the network on a specified target fpga board, call the deploy method for the workflow object. see, .

  5. call the predict function for the workflow object. provide an array of images as the inputimage parameter. provide arguments to turn on the profiler. see classify images on fpga using quantized neural network.

    the labels classifying the images are stored in a structure struct and displayed on the screen. the performance parameters of speed and latency are returned in a structure struct.

use this image to run this code:

snet = resnet18;
ht = dlhdl.target('xilinx','interface','ethernet');
hw = dlhdl.workflow('net',snet,'bitstream','zcu102_single','target',ht);
hw.deploy;
image = imread('zebra.jpeg');
inputimg = imresize(image, [224, 224]);
imshow(inputimg);
[prediction, speed] = hw.predict(single(inputimg),'profile','on');
[val, idx] = max(prediction);
snet.layers(end).classnames{idx}
### finished writing input activations.
### running single input activations.
              deep learning processor profiler performance results
                   lastframelatency(cycles)   lastframelatency(seconds)       framesnum      total latency     frames/s
                         -------------             -------------              ---------        ---------       ---------
network                   23659630                  0.10754                       1           23659630              9.3
    conv1                  2224115                  0.01011 
    pool1                   572867                  0.00260 
    res2a_branch2a          972699                  0.00442 
    res2a_branch2b          972568                  0.00442 
    res2a                   209312                  0.00095 
    res2b_branch2a          972733                  0.00442 
    res2b_branch2b          973022                  0.00442 
    res2b                   209736                  0.00095 
    res3a_branch2a          747507                  0.00340 
    res3a_branch2b          904291                  0.00411 
    res3a_branch1           538763                  0.00245 
    res3a                   104750                  0.00048 
    res3b_branch2a          904389                  0.00411 
    res3b_branch2b          904367                  0.00411 
    res3b                   104886                  0.00048 
    res4a_branch2a          485682                  0.00221 
    res4a_branch2b          880001                  0.00400 
    res4a_branch1           486429                  0.00221 
    res4a                    52628                  0.00024 
    res4b_branch2a          880053                  0.00400 
    res4b_branch2b          880035                  0.00400 
    res4b                    52478                  0.00024 
    res5a_branch2a         1056299                  0.00480 
    res5a_branch2b         2056857                  0.00935 
    res5a_branch1          1056510                  0.00480 
    res5a                    26170                  0.00012 
    res5b_branch2a         2057203                  0.00935 
    res5b_branch2b         2057659                  0.00935 
    res5b                    26381                  0.00012 
    pool5                    71405                  0.00032 
    fc1000                  216155                  0.00098 
 * the clock frequency of the dl processor is: 220mhz
 

the profiler data returns these parameters and their values:

  • lastframelatency(cycles) — total number of clock cycles for previous frame execution.

  • clock frequency — clock frequency information is retrieved from the bitstream that was used to deploy the network to the target board. for example, the profiler returns * the clock frequency of the dl processor is: 220mhz. the clock frequency of 220 mhz is retrieved from the zcu102_single bitstream.

  • lastframelatency(seconds) — total number of seconds for previous frame execution. the total time is calculated as lastframelatency(cycles)/clock frequency. for example the conv_module lastframelatency(seconds) is calculated as 2224115/(220*10^6).

  • framesnum — total number of input frames to the network. this value will be used in the calculation of frames/s.

  • total latency — total number of clock cycles to execute all the network layers and modules for framesnum.

  • frames/s — number of frames processed in one second by the network. the total frames/s is calculated as (framesnum*clock frequency)/total latency. for example the frames/s in the example is calculated as (1*220*10^6)/23659630.

see also

| |

related topics

    网站地图