main content

compare tobit lgd model to benchmark model -凯发k8网页登录

this example shows how to compare a model for loss given default (lgd) against a benchmark model.

load data

load the lgd data.

load lgddata.mat
disp(head(data))
      ltv        age         type           lgd   
    _______    _______    ___________    _________
    0.89101    0.39716    residential     0.032659
    0.70176     2.0939    residential      0.43564
    0.72078     2.7948    residential    0.0064766
    0.37013      1.237    residential     0.007947
    0.36492     2.5818    residential            0
      0.796     1.5957    residential      0.14572
    0.60203     1.1599    residential     0.025688
    0.92005    0.50253    investment      0.063182

split the data into training and test sets.

numobs = height(data);
rng('default'); % for reproducibility
c = cvpartition(numobs,'holdout',0.4);
trainingind = training(c);
testind = test(c);

fit tobit model

fit a lgd model with training data. by default, the last column of the data is used as a response variable and all other columns are used as predictor variables.

lgdmodel = fitlgdmodel(data(trainingind,:),'tobit');
disp(lgdmodel)
  tobit with properties:
      censoringside: "both"
          leftlimit: 0
         rightlimit: 1
            modelid: "tobit"
        description: ""
    underlyingmodel: [1x1 risk.internal.credit.tobitmodel]
      predictorvars: ["ltv"    "age"    "type"]
        responsevar: "lgd"
disp(lgdmodel.underlyingmodel)
tobit regression model:
     lgd = max(0,min(y*,1))
     y* ~ 1   ltv   age   type
estimated coefficients:
                       estimate        se         tstat       pvalue  
                       _________    _________    _______    __________
    (intercept)         0.058257     0.027279     2.1356      0.032828
    ltv                  0.20126     0.031383     6.4129    1.7592e-10
    age                -0.095407    0.0072435    -13.171             0
    type_investment      0.10208     0.018054     5.6544    1.7785e-08
    (sigma)              0.29288    0.0057071     51.318             0
number of observations: 2093
number of left-censored observations: 547
number of uncensored observations: 1521
number of right-censored observations: 25
log-likelihood: -698.383

you can now use this model for prediction or validation. for example, use to predict lgd on test data and visualize the predictions with a histogram.

lgdpredtobit = predict(lgdmodel,data(testind,:));
histogram(lgdpredtobit)
title('predicted lgd, tobit model')
xlabel('predicted lgd')
ylabel('frequency')

figure contains an axes object. the axes object with title predicted lgd, tobit model, xlabel predicted lgd, ylabel frequency contains an object of type histogram.

create benchmark model

in this example, the benchmark model is a lookup table model that segments the data into groups and assigns the mean lgd of the group to all group members. in practice, this common benchmarking approach is easy to understand and use.

the groups in this example are defined using the three predictors. ltv is discretized into low and high levels. age is discretized into young and old loans. type already has two levels, namely, residential and investment. the groups are all the combinations of these values (for example, low ltv, young loan, residential, and so on). the number of levels and the specific cutoff points are only for illustration purposes. the benchmark model uses the same predictors as the model in this example, but you can use other variables to define the groups. in fact, the benchmark model could be a black-box model as long as the predicted lgd values are available for the same customers as in this data set.

% add the discretized variables as new colums in the table.
% discretize the ltv.
ltvedges = [0 0.5 max(data.ltv)];
data.ltvdiscretized = discretize(data.ltv,ltvedges,'categorical',{'low','high'});
% discretize the age.
ageedges = [0 2 max(data.age)];
data.agediscretized = discretize(data.age,ageedges,'categorical',{'young','old'});
% type is already a categorical variable with two levels.

finding the group means on the training data is effectively the fitting of the model. note that the group counts are small for some groups. adding many groups comes with reduced group counts for some groups and more unstable estimates.

% find the group means on training data.
gs = groupsummary(data(trainingind,:),{'ltvdiscretized','agediscretized','type'},'mean','lgd');
disp(gs)
    ltvdiscretized    agediscretized       type        groupcount    mean_lgd
    ______________    ______________    ___________    __________    ________
         low              young         residential        163        0.12166
         low              young         investment          26       0.087331
         low              old           residential        175       0.021776
         low              old           investment          23        0.16379
         high             young         residential       1134        0.16489
         high             young         investment         257        0.25977
         high             old           residential        265       0.066068
         high             old           investment          50        0.11779

to predict an lgd for a new observation, you need to find its group and then assign the group mean as the predicted lgd. use the function, which takes the discretized variables as input. for a completely new data point, the ltv and age information needs to be discretized first by using the function before you use the function.

lgdgroup = findgroups(data(testind,{'ltvdiscretized' 'agediscretized' 'type'}));
lgdpredmeanstest = gs.mean_lgd(lgdgroup);

there are eight unique values in the predictions, as expected, one for each group.

disp(unique(lgdpredmeanstest))
    0.0218
    0.0661
    0.0873
    0.1178
    0.1217
    0.1638
    0.1649
    0.2598

the histogram of the predictions also shows the discrete nature of the model.

histogram(lgdpredmeanstest)
title('predicted lgd, tobit model')
xlabel('predicted lgd')
ylabel('frequency')

figure contains an axes object. the axes object with title predicted lgd, tobit model, xlabel predicted lgd, ylabel frequency contains an object of type histogram.

to have all the predictions available for both training and test sets to make comparisons, add a column with lgd predictions for the entire data set.

lgdgroup = findgroups(data(:,{'ltvdiscretized' 'agediscretized' 'type'}));
data.lgdpredmeans = gs.mean_lgd(lgdgroup);

compare performance

compare the performance of the tobit model and the benchmark model using the validation functions in the model.

start with the area under the receiver operating characteristic (roc) curve, or auroc metric, using .

datasetchoice = "testing";
if datasetchoice=="training"
    ind = trainingind;
else
    ind = testind;
end
discmeasure = modeldiscrimination(lgdmodel,data(ind,:),'showdetails',true,'referencelgd',data.lgdpredmeans(ind),'referenceid','group means')
discmeasure=2×3 table
                    auroc      segment      segmentcount
                   _______    __________    ____________
    tobit          0.67986    "all_data"        1394    
    group means    0.61251    "all_data"        1394    

use to visualize the roc curve.

modeldiscriminationplot(lgdmodel,data(ind,:),'referencelgd',data.lgdpredmeans(ind),'referenceid','group means')

figure contains an axes object. the axes object with title roc tobit, auroc = 0.67986 group means, auroc = 0.61251, xlabel false positive rate, ylabel true positive rate contains 2 objects of type line. these objects represent tobit, group means.

use to compute the calibration metrics.

calmeasure = modelcalibration(lgdmodel,data(ind,:),'referencelgd',data.lgdpredmeans(ind),'referenceid','group means')
calmeasure=2×4 table
                   rsquared     rmse      correlation    samplemeanerror
                   ________    _______    ___________    _______________
    tobit           0.08527    0.23712      0.29201         -0.034412   
    group means    0.041622     0.2406      0.20401        -0.0078124   

use to visualize the scatter plot of the observed lgd values against predicted lgd values.

modelcalibrationplot(lgdmodel,data(ind,:),'referencelgd',data.lgdpredmeans(ind),'referenceid','group means')

figure contains an axes object. the axes object with title scatter tobit, r-squared: 0.08527 group means, r-squared: 0.041622, xlabel lgd predicted, ylabel lgd observed contains 4 objects of type scatter, line. these objects represent data, tobit, fit, tobit, data, group means, fit, group means.

then you can use to visualize the scatter plot of the predicted lgd values against the ltv values.

modelcalibrationplot(lgdmodel,data(ind,:),'referencelgd',data.lgdpredmeans(ind),'referenceid','group means','xdata','ltv','ydata','predicted')

figure contains an axes object. the axes object with title scatter tobit, r-squared: 0.33027 group means, r-squared: 0.16852, xlabel ltv, ylabel lgd predicted contains 4 objects of type scatter, line. these objects represent data, tobit, fit, tobit, data, group means, fit, group means.

see also

| | | | | | |

related examples

    more about

      网站地图