main content

compare results for regression and tobit ead models -凯发k8网页登录

this example shows how to use to create a model and a model for exposure at default (ead) and then compare the results.

load ead data

load the ead data.

load eaddata.mat
head(eaddata)
    utilizationrate    age     marriage        limit         drawn          ead    
    _______________    ___    ___________    __________    __________    __________
        0.24359        25     not married         44776         10907         44740
        0.96946        44     not married    2.1405e 05    2.0751e 05         40678
              0        40     married        1.6581e 05             0    1.6567e 05
        0.53242        38     not married    1.7375e 05         92506        1593.5
         0.2583        30     not married         26258        6782.5        54.175
        0.17039        54     married        1.7357e 05         29575        576.69
        0.18586        27     not married         19590          3641        998.49
        0.85372        42     not married    2.0712e 05    1.7682e 05    1.6454e 05
rng('default');
numobs = height(eaddata);
c = cvpartition(numobs,'holdout',0.4);
trainingind = training(c);
testind = test(c);

select model type

select a and a model type.

modeltyper = "regression";
modeltypet = "tobit";

select conversion measure

select the conversion measure for the ead response values.

conversionmeasure = "lcf";

create regression ead model

use to create a model using the eaddata.

eadmodelregression = fiteadmodel(eaddata,modeltyper,'predictorvars',{'utilizationrate','age','marriage'}, ...
    'conversionmeasure',conversionmeasure,'drawnvar','drawn','limitvar','limit','responsevar','ead');
disp(eadmodelregression);
  regression with properties:
    conversiontransform: "logit"
      boundarytolerance: 1.0000e-07
                modelid: "regression"
            description: ""
        underlyingmodel: [1x1 classreg.regr.compactlinearmodel]
          predictorvars: ["utilizationrate"    "age"    "marriage"]
            responsevar: "ead"
               limitvar: "limit"
               drawnvar: "drawn"
      conversionmeasure: "lcf"

display the underlying model. the underlying model's response variable is the logit transformation of the ead response data. use the 'boundarytolerance', 'limitvar', and 'drawnvar' name-value arguments to modify the transformation.

disp(eadmodelregression.underlyingmodel);
compact linear regression model:
    ead_lcf_logit ~ 1   utilizationrate   age   marriage
estimated coefficients:
                            estimate        se         tstat       pvalue  
                            _________    _________    _______    __________
    (intercept)               -2.4745      0.29892    -8.2781    1.6448e-16
    utilizationrate            6.0045      0.19901     30.172    7.703e-182
    age                     -0.020095    0.0073019     -2.752     0.0059471
    marriage_not married     -0.03509      0.13935    -0.2518        0.8012
number of observations: 4378, error degrees of freedom: 4374
root mean squared error: 4.48
r-squared: 0.173,  adjusted r-squared: 0.173
f-statistic vs. constant model: 305, p-value = 5.7e-180

create tobit ead model

use to create a model using the eaddata.

eadmodeltobit = fiteadmodel(eaddata,modeltypet,'predictorvars',{'utilizationrate','age','marriage'}, ...
    'conversionmeasure',conversionmeasure,'drawnvar','drawn','limitvar','limit','responsevar','ead','censoringside',"right",'leftlimit',0.4,'rightlimit',0.5);
disp(eadmodeltobit);
  tobit with properties:
        censoringside: "right"
            leftlimit: 0.4000
           rightlimit: 0.5000
              modelid: "tobit"
          description: ""
      underlyingmodel: [1x1 risk.internal.credit.tobitmodel]
        predictorvars: ["utilizationrate"    "age"    "marriage"]
          responsevar: "ead"
             limitvar: "limit"
             drawnvar: "drawn"
    conversionmeasure: "lcf"

display the underlying model. the underlying model's response variable is the complog transformation of the ead response data. use the 'limitvar', 'drawnvar', 'censoringside', 'rightlimit', 'leftlimit', and 'solveroptions' name-value arguments to modify the transformation.

disp(eadmodeltobit.underlyingmodel);
tobit regression model, right-censored:
     ead_lcf = min(y*,0.5)
     y* ~ 1   utilizationrate   age   marriage
estimated coefficients:
                             estimate        se         tstat       pvalue  
                            __________    _________    ________    _________
    (intercept)                0.18088     0.021308       8.489            0
    utilizationrate            0.42381     0.013858      30.581            0
    age                     -0.0014564    0.0005232     -2.7836    0.0053982
    marriage_not married    -0.0040197    0.0096904    -0.41481       0.6783
    (sigma)                    0.27917    0.0043369      64.371            0
number of observations: 4378
number of left-censored observations: 0
number of uncensored observations: 2802
number of right-censored observations: 1576
log-likelihood: -1756.98

predict ead for regression model

ead prediction operates on the underlying compact statistical model and then transforms the predicted values back to the ead scale. you can specify the function with different options for the 'modellevel' name-vale argument.

predictedeadregression = predict(eadmodelregression,eaddata(testind,:),'modellevel','ead');
predictedconversionregression = predict(eadmodelregression,eaddata(testind,:),'modellevel','conversionmeasure');

predict ead for tobit model

ead prediction operates on the underlying compact statistical model and then transforms the predicted values back to the ead scale. you can specify the function with different options for the 'modellevel' name-vale argument.

predictedeadtobit = predict(eadmodeltobit,eaddata(testind,:),'modellevel','ead');
predictedconversiontobit = predict(eadmodeltobit,eaddata(testind,:),'modellevel','conversionmeasure');

validate ead regression model

for model validation of the model, use , , , and .

use and then to plot the roc curve.

modellevel = "conversionmeasure";
[discmeasureregression, discdataregression] = modeldiscrimination(eadmodelregression,eaddata(testind,:),'showdetails',true,'modellevel',modellevel)
discmeasureregression=1×3 table
                   auroc      segment      segmentcount
                  _______    __________    ____________
    regression    0.70898    "all_data"        1751    
discdataregression=1534×3 table
        x             y           t   
    __________    _________    _______
             0            0    0.95722
             0    0.0027778    0.95722
             0    0.0041667     0.9566
             0    0.0055556    0.95639
             0    0.0083333    0.95576
    0.00096993    0.0097222    0.95555
    0.00096993     0.016667     0.9549
     0.0019399     0.016667    0.95474
     0.0019399     0.018056    0.95468
     0.0038797     0.018056    0.95403
     0.0048497     0.019444    0.95381
     0.0058196     0.019444    0.95314
     0.0067895     0.020833    0.95291
     0.0067895     0.022222    0.95233
     0.0087294     0.026389    0.95224
     0.0087294     0.031944      0.952
      ⋮
modeldiscriminationplot(eadmodelregression,eaddata(testind, :),'modellevel',modellevel,'segmentby','marriage');

figure contains an axes object. the axes object with title ead_lcf roc segmented by marriage, xlabel false positive rate, ylabel true positive rate contains 2 objects of type line. these objects represent regression, married, auroc = 0.70813, regression, not married, auroc = 0.70921.

use and then to show a scatter plot of the predictions.

ydata = "observed";
[calmeasureregression,caldataregression] = modelcalibration(eadmodelregression,eaddata(testind,:),'modellevel',modellevel)
calmeasureregression=1×4 table
                  rsquared     rmse      correlation    samplemeanerror
                  ________    _______    ___________    _______________
    regression    0.16148     0.41023      0.40184         -0.025994   
caldataregression=1751×3 table
     observed     predicted_regression    residuals_regression
    __________    ____________________    ____________________
       0.99919           0.17519                   0.824      
     0.0020632           0.17343                -0.17137      
       0.03741            0.7527                -0.71529      
       0.75518           0.89867                -0.14349      
    0.00076139          0.042389               -0.041628      
        0.9998           0.95153                0.048274      
     0.0056134            0.1338                -0.12819      
      0.048451          0.043424               0.0050276      
       0.01448          0.059339               -0.044858      
       0.95329           0.67009                  0.2832      
       0.97847             0.939                 0.03947      
       0.71895           0.80122               -0.082271      
       0.79096            0.3791                 0.41186      
      0.042816           0.52542                 -0.4826      
       0.97169            0.2119                 0.75979      
       0.99182           0.62543                 0.36639      
      ⋮
modelcalibrationplot(eadmodelregression, eaddata(testind,:), 'modellevel', modellevel, 'ydata', ydata);

figure contains an axes object. the axes object with title scatter regression, r-squared: 0.16148, xlabel ead_lcf predicted, ylabel ead_lcf observed contains 2 objects of type scatter, line. these objects represent data, fit.

validate ead tobit model

for model validation of the model, use , , , and .

use and then to plot the roc curve.

modellevel = "conversionmeasure";
[discmeasuretobit,discdatatobit] = modeldiscrimination(eadmodeltobit,eaddata(testind,:),'showdetails',true,'modellevel',modellevel)
discmeasuretobit=1×3 table
              auroc      segment      segmentcount
             _______    __________    ____________
    tobit    0.70909    "all_data"        1751    
discdatatobit=1534×3 table
        x             y           t   
    __________    _________    _______
             0            0    0.42178
             0    0.0027778    0.42178
             0    0.0041667     0.4212
             0    0.0055556    0.42076
    0.00096993    0.0069444    0.42062
    0.00096993    0.0097222    0.42018
    0.00096993     0.011111    0.42004
    0.00096993     0.018056     0.4196
     0.0019399     0.018056     0.4195
     0.0029098     0.019444    0.41945
     0.0048497     0.019444    0.41901
     0.0058196     0.020833    0.41887
     0.0058196     0.022222    0.41854
     0.0067895     0.022222    0.41842
     0.0067895     0.023611    0.41827
     0.0067895     0.029167    0.41827
      ⋮
modeldiscriminationplot(eadmodeltobit,eaddata(testind, :),'modellevel',modellevel,'segmentby','marriage');

figure contains an axes object. the axes object with title ead_lcf roc segmented by marriage, xlabel false positive rate, ylabel true positive rate contains 2 objects of type line. these objects represent tobit, married, auroc = 0.70814, tobit, not married, auroc = 0.70928.

use and then . to show a scatter plot of the predictions.

ydata = "observed";
[calmeasuretobit,caldatatobit] = modelcalibration(eadmodeltobit,eaddata(testind,:),'modellevel',modellevel)
calmeasuretobit=1×4 table
             rsquared     rmse      correlation    samplemeanerror
             ________    _______    ___________    _______________
    tobit    0.15929     0.39572      0.39911          0.13366    
caldatatobit=1751×3 table
     observed     predicted_tobit    residuals_tobit
    __________    _______________    _______________
       0.99919        0.21657             0.78261   
     0.0020632        0.21571            -0.21365   
       0.03741        0.35115            -0.31374   
       0.75518        0.39272             0.36245   
    0.00076139        0.12184            -0.12107   
        0.9998        0.41744             0.58237   
     0.0056134        0.19913            -0.19351   
      0.048451        0.12215           -0.073701   
       0.01448        0.14323            -0.12875   
       0.95329        0.33415             0.61914   
       0.97847        0.41069             0.56778   
       0.71895         0.3627             0.35624   
       0.79096        0.27467             0.51629   
      0.042816        0.30579            -0.26297   
       0.97169        0.23025             0.74144   
       0.99182        0.32461             0.66721   
      ⋮
modelcalibrationplot(eadmodeltobit,eaddata(testind,:),'modellevel',modellevel,'ydata',ydata);

figure contains an axes object. the axes object with title scatter tobit, r-squared: 0.15929, xlabel ead_lcf predicted, ylabel ead_lcf observed contains 2 objects of type scatter, line. these objects represent data, fit.

plot histograms of observed with respect to predicted ead

plot a histogram of observed with respect to the predicted ead for the model.

figure;
histogram(caldataregression.observed);
hold on;
histogram(caldataregression.(('predicted_'   modeltyper)));
legend('observed','predicted');

figure contains an axes object. the axes object contains 2 objects of type histogram. these objects represent observed, predicted.

plot a histogram of observed with respect to the predicted ead for the model.

figure;
histogram(caldatatobit.observed);
hold on;
histogram(caldatatobit.(('predicted_'   modeltypet)));
legend('observed','predicted');

figure contains an axes object. the axes object contains 2 objects of type histogram. these objects represent observed, predicted.

for both the tobit and regression models, the age and utilizationrate predictors are statistically significant, while the marriage predictor is not statistically significant. also, the tobit and regression models have different r-square values.

see also

| | | | | | |

related topics

    网站地图