automatic target recognition (atr) in sar images -凯发k8网页登录

this example uses:

this example shows how to train a region-based convolutional neural network (r-cnn) for target recognition in large-scene synthetic aperture radar (sar) images using deep learning toolbox™ and parallel computing toolbox™.

deep learning toolbox provides a framework for designing and implementing deep neural networks with algorithms, pretrained models, and apps.

parallel computing toolbox lets you solve computationally and data-intensive problems using multicore processors, gpus, and computer clusters. it enables you to use gpus directly from matlab® and accelerate the computation capabilities needed in deep learning algorithms.

neural network based algorithms have shown remarkable achievement in diverse areas ranging from natural scene detection to medical imaging. they have shown huge improvement over the standard detection algorithms. inspired by these advancements, researchers have put efforts to apply deep learning based solutions to the field of sar imaging. in this example, the solution has been applied to solve the problem of target detection and recognition. the r-cnn network employed here not only solves problem of integrating detection and recognition but also provides an effective and efficient performance solution that scales to large scene sar images as well.

this example demonstrates how to:

download the dataset and the pretrained model
load and analyze the image data
define the network architecture
specify training options
train the network
evaluate the network

to illustrate this workflow, the example uses the moving and stationary target acquisition and recognition (mstar) clutter dataset published by the air force research laboratory. the dataset is available for download . alternatively, the example also includes a subset of the data used to showcase the workflow. the goal is to develop a model that can detect and recognize the targets.

download the dataset

this example uses a subset of the mstar clutter dataset that contains 300 training and 50 testing clutter images with five different targets. the data was collected using an x-band sensor in the spotlight mode with a one-foot resolution. the data contains rural and urban types of clutters. the types of targets used are btr-60 (armoured car), brdm-2 (fighting vehicle), zsu-23/4 (tank), t62 (tank), and slicy (multiple simple geometric shaped static target). the images were captured at a depression angle of 15 degrees. the clutter data is stored in the png image format and the corresponding ground truth data is stored in the groundtruthmstarclutterdataset.mat file. the file contains 2-d bounding box information for five classes, which are slicy, btr-60, brdm-2, zsu-23/4, and t62 for training and testing data. the size of the dataset is 1.6 gb.

download the dataset using the helperdownloadmstarclutterdata helper function, defined at the end of this example.

outputfolder = pwd;
dataurl = ('https://ssd.mathworks.com/supportfiles/radar/data/mstar_clutterdataset.tar.gz');
helperdownloadmstarclutterdata(outputfolder,dataurl);

depending on your internet connection, the download process can take some time. the code suspends matlab® execution until the download process is complete. alternatively, download the dataset to a local disk using your web browser and extract the file. when using this approach, change the variable in the example to the location of the downloaded file.

download the pretrained network

download the pretrained network from the link here using the helperdownloadpretrainedsardetectornet helper function, defined at the end of this example. the pretrained model allows you to run the entire example without having to wait for the training to complete. to train the network, set the dotrain variable to true.

pretrainedneturl = ('https://ssd.mathworks.com/supportfiles/radar/data/trainedsardetectornet.tar.gz');
dotrain = false;
if ~dotrain
    helperdownloadpretrainedsardetectornet(outputfolder,pretrainedneturl);
end

load the dataset

load the ground truth data (training set and test set). these images are generated in such a way that it places target chips at random locations on a background clutter image. the clutter image is constructed from the downloaded raw data. the generated target will be used as ground truth targets to train and test the network.

load('groundtruthmstarclutterdataset.mat', "trainingdata", "testdata");

the ground truth data is stored in a six-column table, where the first column contains the image file paths and the second to the sixth columns contain the different target bounding boxes.

% display the first few rows of the data set
trainingdata(1:4,:)

ans=4×6 table
            imagefilename                   slicy                 btr_60                brdm_2               zsu_23_4                  t62        
    ______________________________    __________________    __________________    __________________    ___________________    ___________________
    "./trainingimages/img0001.png"    {[ 285 468 28 28]}    {[ 135 331 65 65]}    {[ 597 739 65 65]}    {[ 810 1107 80 80]}    {[1228 1089 87 87]}
    "./trainingimages/img0002.png"    {[595 1585 28 28]}    {[ 880 162 65 65]}    {[308 1683 65 65]}    {[1275 1098 80 80]}    {[1274 1099 87 87]}
    "./trainingimages/img0003.png"    {[200 1140 28 28]}    {[961 1055 65 65]}    {[306 1256 65 65]}    {[ 661 1412 80 80]}    {[  699 886 87 87]}
    "./trainingimages/img0004.png"    {[ 623 186 28 28]}    {[ 536 946 65 65]}    {[ 131 245 65 65]}    {[1030 1266 80 80]}    {[  151 924 87 87]}

display one of the training images and box labels to visualize the data.

img = imread(trainingdata.imagefilename(1));
bbox = reshape(cell2mat(trainingdata{1,2:end}),[4,5])';
labels = {'slicy', 'btr_60', 'brdm_2',  'zsu_23_4', 't62'};
annotatedimage = insertobjectannotation(img,'rectangle',bbox,labels,...
    'textboxopacity',0.9,'fontsize',50);
figure
imshow(annotatedimage);
title('sample training image with bounding boxes and labels')

define the network architecture

create an r-cnn object detector for five targets: slicy, btr_60, brdm_2, zsu_23_4, t62.

objectclasses = {'slicy', 'btr_60', 'brdm_2', 'zsu_23_4', 't62'};

the network must be able to classify the five targets and a background class in order to be trained using the trainrcnnobjectdetector function available in deep learning toolbox™. 1 is added in the code below to include the background class.

numclassesplusbackground = numel(objectclasses)   1;

the final fully connected layer of the network defines the number of classes that it can classify. set the final fully connected layer to have an output size equal to numclassesplusbackground.

% define input size 
inputsize = [128,128,1];
% define network
layers = createnetwork(inputsize,numclassesplusbackground);

now, these network layers can be used to train an r-cnn based five-class object detector.

train faster r-cnn

use (deep learning toolbox) to specify network training options. trainingoptions by default uses a gpu if one is available (requires parallel computing toolbox™ and a cuda® enabled gpu with compute capability 3.0 or higher). otherwise, it uses a cpu. you can also specify the execution environment by using the executionenvironment name-value argument of trainingoptions. to detect automatically if you have a gpu available, set executionenvironment to auto. if you do not have a gpu, or do not want to use one for training, set executionenvironment to cpu. to ensure the use of a gpu for training, set executionenvironment to gpu.

% set training options
options = trainingoptions('sgdm', ...
    'minibatchsize', 128, ...
    'initiallearnrate', 1e-3, ...
    'learnrateschedule', 'piecewise', ...
    'learnratedropfactor', 0.1, ...
    'learnratedropperiod', 100, ...
    'maxepochs', 10, ...
    'verbose', true, ...
    'checkpointpath',tempdir,...
    'executionenvironment','auto');

use trainrcnnobjectdetector to train r-cnn object detector if dotrain is true. otherwise, load the pretrained network. if training, adjust negativeoverlaprange and positiveoverlaprange to ensure that training samples tightly overlap with ground truth.

if dotrain
    % train an r-cnn object detector. this will take several minutes
    detector = trainrcnnobjectdetector(trainingdata, layers, options,'positiveoverlaprange',[0.5 1], 'negativeoverlaprange', [0.1 0.5]);   
else
    % load a previously trained detector
    pretrainedmatfile = fullfile(outputfolder,'trainedsardetectornet.mat');
    load(pretrainedmatfile);
end

evaluate detector on a test image

to get a qualitative idea of the functioning of the detector, pick a random image from the test set and run it through the detector. the detector is expected to return a collection of bounding boxes where it thinks the detected targets are, along with scores indicating confidence in each detection.

% read test image
imgidx = randi(height(testdata));
testimage = imread(testdata.imagefilename(imgidx));
% detect sar targets in the test image
[bboxes,score,label] = detect(detector,testimage,'minibatchsize',16);

to understand the results achieved, overlay the results with the test image. a key parameter is the detection threshold, the score above which the detector detected a target. a higher threshold will result in fewer false positives; however, it also results in more false negatives.

scorethreshold = 0.8;
% display the detection results
outputimage = testimage;
for idx = 1:length(score)
    bbox = bboxes(idx, :);
    thisscore = score(idx);
    
    if thisscore > scorethreshold
        annotation = sprintf('%s: (confidence = %0.2f)', label(idx),...
            round(thisscore,2));
        outputimage = insertobjectannotation(outputimage, 'rectangle', bbox,...
            annotation,'textboxopacity',0.9,'fontsize',45,'linewidth',2);
    end
end
f = figure;
f.position(3:4) = [860,740];
imshow(outputimage)
title('predicted boxes and labels on test image')

evaluate model

by looking at the images sequentially, you can understand the detector performance. to perform more rigorous analysis using the entire test set, run the test set through the detector.

% create a table to hold the bounding boxes, scores and labels output by the detector
numimages = height(testdata);
results = table('size',[numimages 3],...
    'variabletypes',{'cell','cell','cell'},...
    'variablenames',{'boxes','scores','labels'});
% run detector on each image in the test set and collect results
for i = 1:numimages
    imgfilename = testdata.imagefilename{i};
    
    % read the image
    i = imread(imgfilename);
    
    % run the detector
    [bboxes, scores, labels] = detect(detector, i,'minibatchsize',16);
    
    % collect the results
    results.boxes{i} = bboxes;
    results.scores{i} = scores;
    results.labels{i} = labels;
end

the possible detections and their bounding boxes for all images in the test set can be used to calculate the detector's average precision (ap) for each class. the ap is the average of the detector's precision at different levels of recall, so let us define precision and recall.

$p r e c i s i o n = \frac{t p}{t p f p}$
$r e c a l l = \frac{t p}{t p f n}$

where

$t p$ - number of true positives (the detector predicts a target when it is present)
$f p$ - number of false positives (the detector predicts a target when it is not present)
$f n$ - number of false negatives (the detector fails to detect a target when it is present)

a detector with a precision of 1 is considered good at detecting targets that are present, while a detector with a recall of 1 is good at avoiding false detections. precision and recall have an inverse relationship.

plot the relationship between precision and recall for each class. the average value of each curve is the ap. plot curves for detection thresholds with the value of 0.5.

for more details, see (computer vision toolbox).

% extract expected bounding box locations from test data
expectedresults = testdata(:, 2:end);
threshold = 0.5;
% evaluate the object detector using average precision metric
[ap, recall, precision] = evaluatedetectionprecision(results, expectedresults,threshold);
% plot precision recall curve
f = figure; ax = gca; f.position(3:4) = [860,740];
xlabel('recall')
ylabel('precision')
grid on; hold on; legend('location', 'southeast');
title('precision vs recall curve for threshold value 0.5 for different classes');    
for i = 1:length(ap)
% plot precision/recall curve
    plot(ax,recall{i},precision{i},'displayname',['average precision for class ' trainingdata.properties.variablenames{i 1} ' is ' num2str(round(ap(i),3))])
end

the ap for most of the classes is more than 0.9. out of these, the trained model appears to struggle the most in detecting the slicy targets. however, it is still able to achieve an ap of 0.7 for the class.

helper function

the function createnetwork takes as input the image size inputsize and number of classes numclassesplusbackground. the function returns a cnn.

function layers = createnetwork(inputsize,numclassesplusbackground)
    layers = [
        imageinputlayer(inputsize)                      % input layer
        convolution2dlayer(3,32,'padding','same')       % convolution layer
        relulayer                                       % relu layer
        convolution2dlayer(3,32,'padding','same')
        batchnormalizationlayer                         % batch normalization layer
        relulayer
        maxpooling2dlayer(2,'stride',2)                 % max pooling layer
        
        convolution2dlayer(3,64,'padding','same')
        relulayer
        convolution2dlayer(3,64,'padding','same')
        batchnormalizationlayer
        relulayer
        maxpooling2dlayer(2,'stride',2)
        
        convolution2dlayer(3,128,'padding','same')
        relulayer
        convolution2dlayer(3,128,'padding','same')
        batchnormalizationlayer
        relulayer
        maxpooling2dlayer(2,'stride',2)
        convolution2dlayer(3,256,'padding','same')
        relulayer
        convolution2dlayer(3,256,'padding','same')
        batchnormalizationlayer
        relulayer
        maxpooling2dlayer(2,'stride',2)
    
        convolution2dlayer(6,512)
        relulayer
        
        dropoutlayer(0.5)                               % dropout layer
        fullyconnectedlayer(512)                        % fully connected layer.
        relulayer
        fullyconnectedlayer(numclassesplusbackground)
        softmaxlayer                                    % softmax layer
        classificationlayer                             % classification layer
        ];
end
function helperdownloadmstarclutterdata(outputfolder,dataurl)
% download the data set from the given url to the output folder.
    radardatatarfile = fullfile(outputfolder,'mstar_clutterdataset.tar.gz');
    
    if ~exist(radardatatarfile,'file')
        
        disp('downloading mstar clutter data (1.6 gb)...');
        websave(radardatatarfile,dataurl);
        untar(radardatatarfile,outputfolder);
    end
end
function helperdownloadpretrainedsardetectornet(outputfolder,pretrainedneturl)
% download the pretrained network.
    pretrainedmatfile = fullfile(outputfolder,'trainedsardetectornet.mat');
    pretrainedzipfile = fullfile(outputfolder,'trainedsardetectornet.tar.gz');
    
    if ~exist(pretrainedmatfile,'file')
        if ~exist(pretrainedzipfile,'file')
            disp('downloading pretrained detector (29.4 mb)...');
            websave(pretrainedzipfile,pretrainedneturl);
        end
        untar(pretrainedzipfile,outputfolder);   
    end       
end

summary

this example shows how to train an r-cnn for target recognition in sar images. the pretrained network attained an accuracy of more than 0.9.

references

[1] mstar overview.