main content

lane detection optimized with gpu coder -凯发k8网页登录

this example shows how to develop a deep learning lane detection application that runs on nvidia® gpus.

the pretrained lane detection network can detect and output lane marker boundaries from an image and is based on the alexnet network. the last few layers of the alexnet network are replaced by a smaller fully connected layer and regression output layer. the example generates a cuda executable that runs on a cuda-enabled gpu on the host machine.

prerequisites

  • cuda enabled nvidia gpu.

  • nvidia cuda toolkit and driver.

  • nvidia cudnn library.

  • environment variables for the compilers and libraries. for information on the supported versions of the compilers and libraries, see third-party hardware. for setting up the environment variables, see .

verify gpu environment

use the function to verify that the compilers and libraries necessary for running this example are set up correctly.

envcfg = coder.gpuenvconfig('host');
envcfg.deeplibtarget = 'cudnn';
envcfg.deepcodegen = 1;
envcfg.quiet = 1;
coder.checkgpuinstall(envcfg);

get pretrained lane detection network

this example uses the trainedlanenet mat-file containing the pretrained lane detection network. this file is approximately 143 mb size. download the file from the mathworks website.

lanenetfile = matlab.internal.examples.downloadsupportfile('gpucoder/cnn_models/lane_detection', ...
    'trainedlanenet.mat');

this network takes an image as an input and outputs two lane boundaries that correspond to the left and right lanes of the ego vehicle. each lane boundary is represented by the parabolic equation: y=ax2 bx c, where y is the lateral offset and x is the longitudinal distance from the vehicle. the network outputs the three parameters a, b, and c per lane. the network architecture is similar to alexnet except that the last few layers are replaced by a smaller fully connected layer and regression output layer.

load(lanenetfile);
disp(lanenet)
  seriesnetwork with properties:
         layers: [23×1 nnet.cnn.layer.layer]
     inputnames: {'data'}
    outputnames: {'output'}

to view the network architecture, use the analyzenetwork function.

analyzenetwork(lanenet)

download test video

to test the model, the example uses the a video file from the caltech lanes dataset. the file is approximately 8 mb in size. download the file from the mathworks website.

videofile = matlab.internal.examples.downloadsupportfile('gpucoder/media','caltech_cordova1.avi');

main entry-point function

the detectlanesinvideo.m file is the main entry-point function for code generation. the detectlanesinvideo function uses the (computer vision toolbox) system object to read frames from the input video, calls the predict method of the lanenet network object, and draws the detected lanes on the input video. a (computer vision toolbox) system object is used to display the lane detected video output.

type detectlanesinvideo.m
function detectlanesinvideo(videofile,net,lanecoeffmeans,lanecoeffsstds)
% detectlanesinvideo entry-point function for the lane detection optimized
% with gpu coder example
%  
% detectlanesinvideo(videofile,net,lanecoeffmeans,lanecoeffsstds) uses the
% videofilereader system object to read frames from the input video, calls
% the predict method of the lanenet network object, and draws the detected
% lanes on the input video. a deployablevideoplayer system object is used
% to display the lane detected video output.
%   凯发官网入口首页 copyright 2022 the mathworks, inc.
%#codegen
%% create video reader and video player object 
videofreader   = vision.videofilereader(videofile);
depvideoplayer = vision.deployablevideoplayer(name='lane detection on gpu');
%% video frame processing loop
while ~isdone(videofreader)
    videoframe = videofreader();
    scaledframe = 255.*(imresize(videoframe,[227 227]));
    [lanefound,ltpts,rtpts] = lanenetpredict(net,scaledframe, ...
        lanecoeffmeans,lanecoeffsstds);
    if(lanefound)
        pts = [reshape(ltpts',1,[]);reshape(rtpts',1,[])];
        videoframe = insertshape(videoframe, 'line', pts, 'linewidth', 4);
    end
    depvideoplayer(videoframe);
end
end

lanenet predict function

the lanenetpredict function computes the right and left lane positions in a single video frame. the lanenet network computes parameters a, b, and c that describe the parabolic equation for the left and right lane boundaries. from these parameters, compute the x and y coordinates corresponding to the lane positions. the coordinates must be mapped to image coordinates.

type lanenetpredict.m
function [lanefound,ltpts,rtpts] = lanenetpredict(net,frame,means,stds) 
% lanenetpredict predict lane markers on the input image frame using the
% lane detection network
%
%   凯发官网入口首页 copyright 2017-2022 the mathworks, inc.
%#codegen
% a persistent object lanenet is used to load the network object. at the
% first call to this function, the persistent object is constructed and
% setup. when the function is called subsequent times, the same object is
% reused to call predict on inputs, thus avoiding reconstructing and
% reloading the network object.
persistent lanenet;
if isempty(lanenet)
    lanenet = coder.loaddeeplearningnetwork(net, 'lanenet');
end
lanecoeffsnetworkoutput = predict(lanenet,frame);
% recover original coeffs by reversing the normalization steps.
params = lanecoeffsnetworkoutput .* stds   means;
% 'c' should be more than 0.5 for it to be a lane.
isrightlanefound = abs(params(6)) > 0.5;
isleftlanefound =  abs(params(3)) > 0.5;
% from the networks output, compute left and right lane points in the image
% coordinates.
vehiclexpoints = 3:30;
ltpts = coder.nullcopy(zeros(28,2,'single'));
rtpts = coder.nullcopy(zeros(28,2,'single'));
if isrightlanefound && isleftlanefound
    rtboundary = params(4:6);
    rt_y = computeboundarymodel(rtboundary, vehiclexpoints);
    
    ltboundary = params(1:3);
    lt_y = computeboundarymodel(ltboundary, vehiclexpoints);
    % visualize lane boundaries of the ego vehicle.
    tform = get_tformtoimage;
    % map vehicle to image coordinates.
    ltpts =  tform.transformpointsinverse([vehiclexpoints', lt_y']);
    rtpts =  tform.transformpointsinverse([vehiclexpoints', rt_y']);
    lanefound = true;
else
    lanefound = false;
end
end
%% helper functions
% compute boundary model.
function yworld = computeboundarymodel(model, xworld)
yworld = polyval(model, xworld);
end
% compute extrinsics.
function tform = get_tformtoimage
%the camera coordinates are described by the caltech mono
% camera model.
yaw = 0;
pitch = 14; % pitch of the camera in degrees
roll = 0;
translation = translationvector(yaw, pitch, roll);
rotation    = rotationmatrix(yaw, pitch, roll);
% construct a camera matrix.
focallength    = [309.4362, 344.2161];
principalpoint = [318.9034, 257.5352];
skew = 0;
cammatrix = [rotation; translation] * intrinsicmatrix(focallength, ...
    skew, principalpoint);
% turn cammatrix into 2-d homography.
tform2d = [cammatrix(1,:); cammatrix(2,:); cammatrix(4,:)]; % drop z
tform = projective2d(tform2d);
tform = tform.invert();
end
% translate to image co-ordinates.
function translation = translationvector(yaw, pitch, roll)
sensorlocation = [0 0];
height = 2.1798;    % mounting height in meters from the ground
rotationmatrix = (...
    rotz(yaw)*... % last rotation
    rotx(90-pitch)*...
    rotz(roll)... % first rotation
    );
% adjust for the sensorlocation by adding a translation.
sl = sensorlocation;
translationinworldunits = [sl(2), sl(1), height];
translation = translationinworldunits*rotationmatrix;
end
% rotation around x-axis.
function r = rotx(a)
a = deg2rad(a);
r = [...
    1   0        0;
    0   cos(a)  -sin(a);
    0   sin(a)   cos(a)];
end
% rotation around y-axis.
function r = roty(a)
a = deg2rad(a);
r = [...
    cos(a)  0 sin(a);
    0       1 0;
    -sin(a) 0 cos(a)];
end
% rotation around z-axis.
function r = rotz(a)
a = deg2rad(a);
r = [...
    cos(a) -sin(a) 0;
    sin(a)  cos(a) 0;
    0       0      1];
end
% given the yaw, pitch, and roll, determine the appropriate euler angles
% and the sequence in which they are applied to align the camera's
% coordinate system with the vehicle coordinate system. the resulting
% matrix is a rotation matrix that together with the translation vector
% defines the extrinsic parameters of the camera.
function rotation = rotationmatrix(yaw, pitch, roll)
rotation = (...
    roty(180)*...            % last rotation: point z up
    rotz(-90)*...            % x-y swap
    rotz(yaw)*...            % point the camera forward
    rotx(90-pitch)*...       % "un-pitch"
    rotz(roll)...            % 1st rotation: "un-roll"
    );
end
% intrinsic matrix computation.
function intrinsicmat = intrinsicmatrix(focallength, skew, principalpoint)
intrinsicmat = ...
    [focallength(1)  , 0                     , 0; ...
    skew             , focallength(2)   , 0; ...
    principalpoint(1), principalpoint(2), 1];
end

generate cuda executable

to generate a standalone cuda executable for the detectlanesinvideo entry-point function, create a gpu code configuration object for 'exe' target and set the target language to c . use the function to create a cudnn deep learning configuration object and assign it to the deeplearningconfig property of the gpu code configuration object.

cfg = coder.gpuconfig('exe');
cfg.deeplearningconfig = coder.deeplearningconfig('cudnn');
cfg.generatereport = true;
cfg.generateexamplemain = "generatecodeandcompile";
cfg.targetlang = 'c  ';
inputs = {coder.constant(videofile),coder.constant(lanenetfile), ...
    coder.constant(lanecoeffmeans),coder.constant(lanecoeffsstds)};

run the codegen command.

codegen -args inputs -config cfg detectlanesinvideo
code generation successful: view report

generated code description

the series network is generated as a c class containing an array of 18 layer classes (after layer fusion optimization). the setup() method of the class sets up handles and allocates memory for each layer object. the predict() method invokes prediction for each of the 18 layers in the network.

class lanenet0_0 {
public:
  lanenet0_0();
  void setsize();
  void resetstate();
  void setup();
  void predict();
  void cleanup();
  float *getlayeroutput(int layerindex, int portindex);
  int getlayeroutputsize(int layerindex, int portindex);
  float *getinputdatapointer(int b_index);
  float *getinputdatapointer();
  float *getoutputdatapointer(int b_index);
  float *getoutputdatapointer();
  int getbatchsize();
  ~lanenet0_0();
private:
  void allocate();
  void postsetup();
  void deallocate();
public:
  boolean_t isinitialized;
  boolean_t matlabcodegenisdeleted;
private:
  int numlayers;
  mwtensorbase *inputtensors[1];
  mwtensorbase *outputtensors[1];
  mwcnnlayer *layers[18];
  mwcudnntarget::mwtargetnetworkimpl *targetimpl;
};

the cnn_lanenet*_conv*_w and cnn_lanenet*_conv*_b files are the binary weights and bias file for convolution layer in the network. the cnn_lanenet*_fc*_w and cnn_lanenet*_fc*_b files are the binary weights and bias file for fully connected layer in the network.

codegendir = fullfile('codegen', 'exe', 'detectlanesinvideo');
dir([codegendir,filesep,'*.bin'])
cnn_lanenet0_0_conv1_b.bin        cnn_lanenet0_0_conv3_b.bin        cnn_lanenet0_0_conv5_b.bin        cnn_lanenet0_0_fc6_b.bin          cnn_lanenet0_0_fclane2_b.bin      
cnn_lanenet0_0_conv1_w.bin        cnn_lanenet0_0_conv3_w.bin        cnn_lanenet0_0_conv5_w.bin        cnn_lanenet0_0_fc6_w.bin          cnn_lanenet0_0_fclane2_w.bin      
cnn_lanenet0_0_conv2_b.bin        cnn_lanenet0_0_conv4_b.bin        cnn_lanenet0_0_data_offset.bin    cnn_lanenet0_0_fclane1_b.bin      networkparamsinfo_lanenet0_0.bin  
cnn_lanenet0_0_conv2_w.bin        cnn_lanenet0_0_conv4_w.bin        cnn_lanenet0_0_data_scale.bin     cnn_lanenet0_0_fclane1_w.bin      

run the executable

to run the executable, uncomment the following lines of code.

if ispc
    [status,cmdout] = system("detectlanesinvideo.exe");
else
    [status,cmdout] = system("./detectlanesinvideo");
end

gpulanedetectionoutput.png

see also

functions

objects

  • | | |

related topics

网站地图