compare results for regression and tobit ead models -凯发k8网页登录
this example shows how to use to create a model and a model for exposure at default (ead) and then compare the results.
load ead data
load the ead data.
load eaddata.mat
head(eaddata)
utilizationrate age marriage limit drawn ead _______________ ___ ___________ __________ __________ __________ 0.24359 25 not married 44776 10907 44740 0.96946 44 not married 2.1405e 05 2.0751e 05 40678 0 40 married 1.6581e 05 0 1.6567e 05 0.53242 38 not married 1.7375e 05 92506 1593.5 0.2583 30 not married 26258 6782.5 54.175 0.17039 54 married 1.7357e 05 29575 576.69 0.18586 27 not married 19590 3641 998.49 0.85372 42 not married 2.0712e 05 1.7682e 05 1.6454e 05
rng('default'); numobs = height(eaddata); c = cvpartition(numobs,'holdout',0.4); trainingind = training(c); testind = test(c);
select model type
select a and a model type.
modeltyper = "regression"; modeltypet = "tobit";
select conversion measure
select the conversion measure for the ead response values.
conversionmeasure = "lcf";
create regression
ead model
use to create a model using the eaddata
.
eadmodelregression = fiteadmodel(eaddata,modeltyper,'predictorvars',{'utilizationrate','age','marriage'}, ... 'conversionmeasure',conversionmeasure,'drawnvar','drawn','limitvar','limit','responsevar','ead'); disp(eadmodelregression);
regression with properties: conversiontransform: "logit" boundarytolerance: 1.0000e-07 modelid: "regression" description: "" underlyingmodel: [1x1 classreg.regr.compactlinearmodel] predictorvars: ["utilizationrate" "age" "marriage"] responsevar: "ead" limitvar: "limit" drawnvar: "drawn" conversionmeasure: "lcf"
display the underlying model. the underlying model's response variable is the logit
transformation of the ead response data. use the 'boundarytolerance'
, 'limitvar'
, and 'drawnvar'
name-value arguments to modify the transformation.
disp(eadmodelregression.underlyingmodel);
compact linear regression model: ead_lcf_logit ~ 1 utilizationrate age marriage estimated coefficients: estimate se tstat pvalue _________ _________ _______ __________ (intercept) -2.4745 0.29892 -8.2781 1.6448e-16 utilizationrate 6.0045 0.19901 30.172 7.703e-182 age -0.020095 0.0073019 -2.752 0.0059471 marriage_not married -0.03509 0.13935 -0.2518 0.8012 number of observations: 4378, error degrees of freedom: 4374 root mean squared error: 4.48 r-squared: 0.173, adjusted r-squared: 0.173 f-statistic vs. constant model: 305, p-value = 5.7e-180
create tobit
ead model
use to create a model using the eaddata
.
eadmodeltobit = fiteadmodel(eaddata,modeltypet,'predictorvars',{'utilizationrate','age','marriage'}, ... 'conversionmeasure',conversionmeasure,'drawnvar','drawn','limitvar','limit','responsevar','ead','censoringside',"right",'leftlimit',0.4,'rightlimit',0.5); disp(eadmodeltobit);
tobit with properties: censoringside: "right" leftlimit: 0.4000 rightlimit: 0.5000 modelid: "tobit" description: "" underlyingmodel: [1x1 risk.internal.credit.tobitmodel] predictorvars: ["utilizationrate" "age" "marriage"] responsevar: "ead" limitvar: "limit" drawnvar: "drawn" conversionmeasure: "lcf"
display the underlying model. the underlying model's response variable is the complog
transformation of the ead response data. use the 'limitvar'
, 'drawnvar'
, 'censoringside'
, 'rightlimit'
, 'leftlimit'
, and 'solveroptions'
name-value arguments to modify the transformation.
disp(eadmodeltobit.underlyingmodel);
tobit regression model, right-censored: ead_lcf = min(y*,0.5) y* ~ 1 utilizationrate age marriage estimated coefficients: estimate se tstat pvalue __________ _________ ________ _________ (intercept) 0.18088 0.021308 8.489 0 utilizationrate 0.42381 0.013858 30.581 0 age -0.0014564 0.0005232 -2.7836 0.0053982 marriage_not married -0.0040197 0.0096904 -0.41481 0.6783 (sigma) 0.27917 0.0043369 64.371 0 number of observations: 4378 number of left-censored observations: 0 number of uncensored observations: 2802 number of right-censored observations: 1576 log-likelihood: -1756.98
predict ead for regression
model
ead prediction operates on the underlying compact statistical model and then transforms the predicted values back to the ead scale. you can specify the function with different options for the 'modellevel'
name-vale argument.
predictedeadregression = predict(eadmodelregression,eaddata(testind,:),'modellevel','ead'); predictedconversionregression = predict(eadmodelregression,eaddata(testind,:),'modellevel','conversionmeasure');
predict ead for tobit
model
ead prediction operates on the underlying compact statistical model and then transforms the predicted values back to the ead scale. you can specify the function with different options for the 'modellevel'
name-vale argument.
predictedeadtobit = predict(eadmodeltobit,eaddata(testind,:),'modellevel','ead'); predictedconversiontobit = predict(eadmodeltobit,eaddata(testind,:),'modellevel','conversionmeasure');
validate ead regression
model
for model validation of the model, use , , , and .
use and then to plot the roc curve.
modellevel = "conversionmeasure"; [discmeasureregression, discdataregression] = modeldiscrimination(eadmodelregression,eaddata(testind,:),'showdetails',true,'modellevel',modellevel)
discmeasureregression=1×3 table
auroc segment segmentcount
_______ __________ ____________
regression 0.70898 "all_data" 1751
discdataregression=1534×3 table
x y t
__________ _________ _______
0 0 0.95722
0 0.0027778 0.95722
0 0.0041667 0.9566
0 0.0055556 0.95639
0 0.0083333 0.95576
0.00096993 0.0097222 0.95555
0.00096993 0.016667 0.9549
0.0019399 0.016667 0.95474
0.0019399 0.018056 0.95468
0.0038797 0.018056 0.95403
0.0048497 0.019444 0.95381
0.0058196 0.019444 0.95314
0.0067895 0.020833 0.95291
0.0067895 0.022222 0.95233
0.0087294 0.026389 0.95224
0.0087294 0.031944 0.952
⋮
modeldiscriminationplot(eadmodelregression,eaddata(testind, :),'modellevel',modellevel,'segmentby','marriage');
use and then to show a scatter plot of the predictions.
ydata = "observed"; [calmeasureregression,caldataregression] = modelcalibration(eadmodelregression,eaddata(testind,:),'modellevel',modellevel)
calmeasureregression=1×4 table
rsquared rmse correlation samplemeanerror
________ _______ ___________ _______________
regression 0.16148 0.41023 0.40184 -0.025994
caldataregression=1751×3 table
observed predicted_regression residuals_regression
__________ ____________________ ____________________
0.99919 0.17519 0.824
0.0020632 0.17343 -0.17137
0.03741 0.7527 -0.71529
0.75518 0.89867 -0.14349
0.00076139 0.042389 -0.041628
0.9998 0.95153 0.048274
0.0056134 0.1338 -0.12819
0.048451 0.043424 0.0050276
0.01448 0.059339 -0.044858
0.95329 0.67009 0.2832
0.97847 0.939 0.03947
0.71895 0.80122 -0.082271
0.79096 0.3791 0.41186
0.042816 0.52542 -0.4826
0.97169 0.2119 0.75979
0.99182 0.62543 0.36639
⋮
modelcalibrationplot(eadmodelregression, eaddata(testind,:), 'modellevel', modellevel, 'ydata', ydata);
validate ead tobit
model
for model validation of the model, use , , , and .
use and then to plot the roc curve.
modellevel = "conversionmeasure"; [discmeasuretobit,discdatatobit] = modeldiscrimination(eadmodeltobit,eaddata(testind,:),'showdetails',true,'modellevel',modellevel)
discmeasuretobit=1×3 table
auroc segment segmentcount
_______ __________ ____________
tobit 0.70909 "all_data" 1751
discdatatobit=1534×3 table
x y t
__________ _________ _______
0 0 0.42178
0 0.0027778 0.42178
0 0.0041667 0.4212
0 0.0055556 0.42076
0.00096993 0.0069444 0.42062
0.00096993 0.0097222 0.42018
0.00096993 0.011111 0.42004
0.00096993 0.018056 0.4196
0.0019399 0.018056 0.4195
0.0029098 0.019444 0.41945
0.0048497 0.019444 0.41901
0.0058196 0.020833 0.41887
0.0058196 0.022222 0.41854
0.0067895 0.022222 0.41842
0.0067895 0.023611 0.41827
0.0067895 0.029167 0.41827
⋮
modeldiscriminationplot(eadmodeltobit,eaddata(testind, :),'modellevel',modellevel,'segmentby','marriage');
use and then . to show a scatter plot of the predictions.
ydata = "observed"; [calmeasuretobit,caldatatobit] = modelcalibration(eadmodeltobit,eaddata(testind,:),'modellevel',modellevel)
calmeasuretobit=1×4 table
rsquared rmse correlation samplemeanerror
________ _______ ___________ _______________
tobit 0.15929 0.39572 0.39911 0.13366
caldatatobit=1751×3 table
observed predicted_tobit residuals_tobit
__________ _______________ _______________
0.99919 0.21657 0.78261
0.0020632 0.21571 -0.21365
0.03741 0.35115 -0.31374
0.75518 0.39272 0.36245
0.00076139 0.12184 -0.12107
0.9998 0.41744 0.58237
0.0056134 0.19913 -0.19351
0.048451 0.12215 -0.073701
0.01448 0.14323 -0.12875
0.95329 0.33415 0.61914
0.97847 0.41069 0.56778
0.71895 0.3627 0.35624
0.79096 0.27467 0.51629
0.042816 0.30579 -0.26297
0.97169 0.23025 0.74144
0.99182 0.32461 0.66721
⋮
modelcalibrationplot(eadmodeltobit,eaddata(testind,:),'modellevel',modellevel,'ydata',ydata);
plot histograms of observed with respect to predicted ead
plot a histogram of observed with respect to the predicted ead for the model.
figure; histogram(caldataregression.observed); hold on; histogram(caldataregression.(('predicted_' modeltyper))); legend('observed','predicted');
plot a histogram of observed with respect to the predicted ead for the model.
figure; histogram(caldatatobit.observed); hold on; histogram(caldatatobit.(('predicted_' modeltypet))); legend('observed','predicted');
for both the tobit
and regression
models, the age
and utilizationrate
predictors are statistically significant, while the marriage
predictor is not statistically significant. also, the tobit
and regression
models have different r-square values.
see also
| | | | | | |