automatic target recognition (atr) in sar images -凯发k8网页登录
this example shows how to train a region-based convolutional neural network (r-cnn) for target recognition in large-scene synthetic aperture radar (sar) images using deep learning toolbox™ and parallel computing toolbox™.
deep learning toolbox provides a framework for designing and implementing deep neural networks with algorithms, pretrained models, and apps.
parallel computing toolbox lets you solve computationally and data-intensive problems using multicore processors, gpus, and computer clusters. it enables you to use gpus directly from matlab® and accelerate the computation capabilities needed in deep learning algorithms.
neural network based algorithms have shown remarkable achievement in diverse areas ranging from natural scene detection to medical imaging. they have shown huge improvement over the standard detection algorithms. inspired by these advancements, researchers have put efforts to apply deep learning based solutions to the field of sar imaging. in this example, the solution has been applied to solve the problem of target detection and recognition. the r-cnn network employed here not only solves problem of integrating detection and recognition but also provides an effective and efficient performance solution that scales to large scene sar images as well.
this example demonstrates how to:
download the dataset and the pretrained model
load and analyze the image data
define the network architecture
specify training options
train the network
evaluate the network
to illustrate this workflow, the example uses the moving and stationary target acquisition and recognition (mstar) clutter dataset published by the air force research laboratory. the dataset is available for download . alternatively, the example also includes a subset of the data used to showcase the workflow. the goal is to develop a model that can detect and recognize the targets.
download the dataset
this example uses a subset of the mstar clutter dataset that contains 300 training and 50 testing clutter images with five different targets. the data was collected using an x-band sensor in the spotlight mode with a one-foot resolution. the data contains rural and urban types of clutters. the types of targets used are btr-60 (armoured car), brdm-2 (fighting vehicle), zsu-23/4 (tank), t62 (tank), and slicy (multiple simple geometric shaped static target). the images were captured at a depression angle of 15 degrees. the clutter data is stored in the png image format and the corresponding ground truth data is stored in the groundtruthmstarclutterdataset.mat
file. the file contains 2-d bounding box information for five classes, which are slicy, btr-60, brdm-2, zsu-23/4, and t62 for training and testing data. the size of the dataset is 1.6 gb.
download the dataset using the helperdownloadmstarclutterdata
helper function, defined at the end of this example.
outputfolder = pwd;
dataurl = ('https://ssd.mathworks.com/supportfiles/radar/data/mstar_clutterdataset.tar.gz');
helperdownloadmstarclutterdata(outputfolder,dataurl);
depending on your internet connection, the download process can take some time. the code suspends matlab® execution until the download process is complete. alternatively, download the dataset to a local disk using your web browser and extract the file. when using this approach, change the
download the pretrained network
download the pretrained network from the link here using the helperdownloadpretrainedsardetectornet
helper function, defined at the end of this example. the pretrained model allows you to run the entire example without having to wait for the training to complete. to train the network, set the dotrain
variable to true
.
pretrainedneturl = ('https://ssd.mathworks.com/supportfiles/radar/data/trainedsardetectornet.tar.gz'); dotrain = false; if ~dotrain helperdownloadpretrainedsardetectornet(outputfolder,pretrainedneturl); end
load the dataset
load the ground truth data (training set and test set). these images are generated in such a way that it places target chips at random locations on a background clutter image. the clutter image is constructed from the downloaded raw data. the generated target will be used as ground truth targets to train and test the network.
load('groundtruthmstarclutterdataset.mat', "trainingdata", "testdata");
the ground truth data is stored in a six-column table, where the first column contains the image file paths and the second to the sixth columns contain the different target bounding boxes.
% display the first few rows of the data set
trainingdata(1:4,:)
ans=4×6 table
imagefilename slicy btr_60 brdm_2 zsu_23_4 t62
______________________________ __________________ __________________ __________________ ___________________ ___________________
"./trainingimages/img0001.png" {[ 285 468 28 28]} {[ 135 331 65 65]} {[ 597 739 65 65]} {[ 810 1107 80 80]} {[1228 1089 87 87]}
"./trainingimages/img0002.png" {[595 1585 28 28]} {[ 880 162 65 65]} {[308 1683 65 65]} {[1275 1098 80 80]} {[1274 1099 87 87]}
"./trainingimages/img0003.png" {[200 1140 28 28]} {[961 1055 65 65]} {[306 1256 65 65]} {[ 661 1412 80 80]} {[ 699 886 87 87]}
"./trainingimages/img0004.png" {[ 623 186 28 28]} {[ 536 946 65 65]} {[ 131 245 65 65]} {[1030 1266 80 80]} {[ 151 924 87 87]}
display one of the training images and box labels to visualize the data.
img = imread(trainingdata.imagefilename(1)); bbox = reshape(cell2mat(trainingdata{1,2:end}),[4,5])'; labels = {'slicy', 'btr_60', 'brdm_2', 'zsu_23_4', 't62'}; annotatedimage = insertobjectannotation(img,'rectangle',bbox,labels,... 'textboxopacity',0.9,'fontsize',50); figure imshow(annotatedimage); title('sample training image with bounding boxes and labels')
define the network architecture
create an r-cnn object detector for five targets: slicy, btr_60, brdm_2, zsu_23_4, t62.
objectclasses = {'slicy', 'btr_60', 'brdm_2', 'zsu_23_4', 't62'};
the network must be able to classify the five targets and a background class in order to be trained using the trainrcnnobjectdetector
function available in deep learning toolbox™. 1
is added in the code below to include the background class.
numclassesplusbackground = numel(objectclasses) 1;
the final fully connected layer of the network defines the number of classes that it can classify. set the final fully connected layer to have an output size equal to numclassesplusbackground
.
% define input size inputsize = [128,128,1]; % define network layers = createnetwork(inputsize,numclassesplusbackground);
now, these network layers can be used to train an r-cnn based five-class object detector.
train faster r-cnn
use (deep learning toolbox) to specify network training options. trainingoptions
by default uses a gpu if one is available (requires parallel computing toolbox™ and a cuda® enabled gpu with compute capability 3.0 or higher). otherwise, it uses a cpu. you can also specify the execution environment by using the executionenvironment
name-value argument of trainingoptions
. to detect automatically if you have a gpu available, set executionenvironment
to auto
. if you do not have a gpu, or do not want to use one for training, set executionenvironment
to cpu
. to ensure the use of a gpu for training, set executionenvironment
to gpu
.
% set training options options = trainingoptions('sgdm', ... 'minibatchsize', 128, ... 'initiallearnrate', 1e-3, ... 'learnrateschedule', 'piecewise', ... 'learnratedropfactor', 0.1, ... 'learnratedropperiod', 100, ... 'maxepochs', 10, ... 'verbose', true, ... 'checkpointpath',tempdir,... 'executionenvironment','auto');
use trainrcnnobjectdetector
to train r-cnn object detector if dotrain
is true. otherwise, load the pretrained network. if training, adjust negativeoverlaprange
and positiveoverlaprange
to ensure that training samples tightly overlap with ground truth.
if dotrain % train an r-cnn object detector. this will take several minutes detector = trainrcnnobjectdetector(trainingdata, layers, options,'positiveoverlaprange',[0.5 1], 'negativeoverlaprange', [0.1 0.5]); else % load a previously trained detector pretrainedmatfile = fullfile(outputfolder,'trainedsardetectornet.mat'); load(pretrainedmatfile); end
evaluate detector on a test image
to get a qualitative idea of the functioning of the detector, pick a random image from the test set and run it through the detector. the detector is expected to return a collection of bounding boxes where it thinks the detected targets are, along with scores indicating confidence in each detection.
% read test image imgidx = randi(height(testdata)); testimage = imread(testdata.imagefilename(imgidx)); % detect sar targets in the test image [bboxes,score,label] = detect(detector,testimage,'minibatchsize',16);
to understand the results achieved, overlay the results with the test image. a key parameter is the detection threshold, the score above which the detector detected a target. a higher threshold will result in fewer false positives; however, it also results in more false negatives.
scorethreshold = 0.8; % display the detection results outputimage = testimage; for idx = 1:length(score) bbox = bboxes(idx, :); thisscore = score(idx); if thisscore > scorethreshold annotation = sprintf('%s: (confidence = %0.2f)', label(idx),... round(thisscore,2)); outputimage = insertobjectannotation(outputimage, 'rectangle', bbox,... annotation,'textboxopacity',0.9,'fontsize',45,'linewidth',2); end end f = figure; f.position(3:4) = [860,740]; imshow(outputimage) title('predicted boxes and labels on test image')
evaluate model
by looking at the images sequentially, you can understand the detector performance. to perform more rigorous analysis using the entire test set, run the test set through the detector.
% create a table to hold the bounding boxes, scores and labels output by the detector numimages = height(testdata); results = table('size',[numimages 3],... 'variabletypes',{'cell','cell','cell'},... 'variablenames',{'boxes','scores','labels'}); % run detector on each image in the test set and collect results for i = 1:numimages imgfilename = testdata.imagefilename{i}; % read the image i = imread(imgfilename); % run the detector [bboxes, scores, labels] = detect(detector, i,'minibatchsize',16); % collect the results results.boxes{i} = bboxes; results.scores{i} = scores; results.labels{i} = labels; end
the possible detections and their bounding boxes for all images in the test set can be used to calculate the detector's average precision (ap) for each class. the ap is the average of the detector's precision at different levels of recall, so let us define precision and recall.
where
- number of true positives (the detector predicts a target when it is present)
- number of false positives (the detector predicts a target when it is not present)
- number of false negatives (the detector fails to detect a target when it is present)
a detector with a precision of 1 is considered good at detecting targets that are present, while a detector with a recall of 1 is good at avoiding false detections. precision and recall have an inverse relationship.
plot the relationship between precision and recall for each class. the average value of each curve is the ap. plot curves for detection thresholds with the value of 0.5.
for more details, see (computer vision toolbox).
% extract expected bounding box locations from test data expectedresults = testdata(:, 2:end); threshold = 0.5; % evaluate the object detector using average precision metric [ap, recall, precision] = evaluatedetectionprecision(results, expectedresults,threshold); % plot precision recall curve f = figure; ax = gca; f.position(3:4) = [860,740]; xlabel('recall') ylabel('precision') grid on; hold on; legend('location', 'southeast'); title('precision vs recall curve for threshold value 0.5 for different classes'); for i = 1:length(ap) % plot precision/recall curve plot(ax,recall{i},precision{i},'displayname',['average precision for class ' trainingdata.properties.variablenames{i 1} ' is ' num2str(round(ap(i),3))]) end
the ap for most of the classes is more than 0.9. out of these, the trained model appears to struggle the most in detecting the slicy targets. however, it is still able to achieve an ap of 0.7 for the class.
helper function
the function createnetwork
takes as input the image size inputsize
and number of classes numclassesplusbackground
. the function returns a cnn.
function layers = createnetwork(inputsize,numclassesplusbackground) layers = [ imageinputlayer(inputsize) % input layer convolution2dlayer(3,32,'padding','same') % convolution layer relulayer % relu layer convolution2dlayer(3,32,'padding','same') batchnormalizationlayer % batch normalization layer relulayer maxpooling2dlayer(2,'stride',2) % max pooling layer convolution2dlayer(3,64,'padding','same') relulayer convolution2dlayer(3,64,'padding','same') batchnormalizationlayer relulayer maxpooling2dlayer(2,'stride',2) convolution2dlayer(3,128,'padding','same') relulayer convolution2dlayer(3,128,'padding','same') batchnormalizationlayer relulayer maxpooling2dlayer(2,'stride',2) convolution2dlayer(3,256,'padding','same') relulayer convolution2dlayer(3,256,'padding','same') batchnormalizationlayer relulayer maxpooling2dlayer(2,'stride',2) convolution2dlayer(6,512) relulayer dropoutlayer(0.5) % dropout layer fullyconnectedlayer(512) % fully connected layer. relulayer fullyconnectedlayer(numclassesplusbackground) softmaxlayer % softmax layer classificationlayer % classification layer ]; end function helperdownloadmstarclutterdata(outputfolder,dataurl) % download the data set from the given url to the output folder. radardatatarfile = fullfile(outputfolder,'mstar_clutterdataset.tar.gz'); if ~exist(radardatatarfile,'file') disp('downloading mstar clutter data (1.6 gb)...'); websave(radardatatarfile,dataurl); untar(radardatatarfile,outputfolder); end end function helperdownloadpretrainedsardetectornet(outputfolder,pretrainedneturl) % download the pretrained network. pretrainedmatfile = fullfile(outputfolder,'trainedsardetectornet.mat'); pretrainedzipfile = fullfile(outputfolder,'trainedsardetectornet.tar.gz'); if ~exist(pretrainedmatfile,'file') if ~exist(pretrainedzipfile,'file') disp('downloading pretrained detector (29.4 mb)...'); websave(pretrainedzipfile,pretrainedneturl); end untar(pretrainedzipfile,outputfolder); end end
summary
this example shows how to train an r-cnn for target recognition in sar images. the pretrained network attained an accuracy of more than 0.9.
references
[1] mstar overview.