automatically select classification model with optimized hyperparameters -凯发k8网页登录
automatically select classification model with optimized hyperparameters
since r2020a
syntax
description
given predictor and response data, fitcauto
automatically
tries a selection of classification model types with different hyperparameter values. by
default, the function uses bayesian optimization to select models and their hyperparameter
values, and computes the cross-validation classification error for each model. after the
optimization is complete, fitcauto
returns the model, trained on the
entire data set, that is expected to best classify new data. you can use the
predict
and loss
object functions of the returned
model to classify new data and compute the test set classification error,
respectively.
use fitcauto
when you are uncertain which classifier types best suit
your data. for information on alternative methods for tuning hyperparameters of classification
models, see alternative functionality.
if your data contains over 10,000 observations, consider using an asynchronous successive
halving algorithm (asha) instead of bayesian optimization when you run
fitcauto
. asha optimization often finds good solutions faster than
bayesian optimization for data sets with many observations.
returns a classification model mdl
= fitcauto(tbl
,responsevarname
)mdl
with tuned hyperparameters. the
table tbl
contains the predictor variables and the response variable,
where responsevarname
is the name of the response variable.
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in previous syntaxes. for example, use the
mdl
= fitcauto(___,name,value
)hyperparameteroptimizationoptions
name-value argument to specify
whether to use bayesian optimization (default) or an asynchronous successive halving
algorithm (asha). to use asha optimization, specify
"hyperparameteroptimizationoptions",struct("optimizer","asha")
. you
can include additional fields in the structure to control other aspects of the
optimization.
[
also returns mdl
,optimizationresults
] = fitcauto(___)optimizationresults
, which contains the results of the
model selection and hyperparameter tuning process. this output is a
bayesianoptimization
object when you use bayesian optimization, and a
table when you use asha optimization.
examples
automatically select classifier using table data
use fitcauto
to automatically select a classification model with optimized hyperparameters, given predictor and response data stored in a table.
load data
load the carbig
data set, which contains measurements of cars made in the 1970s and early 1980s.
load carbig
categorize the cars based on whether they were made in the usa.
origin = categorical(cellstr(origin)); origin = mergecats(origin,["france","japan","germany", ... "sweden","italy","england"],"notusa");
create a table containing the predictor variables acceleration
, displacement
, and so on, as well as the response variable origin
.
cars = table(acceleration,displacement,horsepower, ...
model_year,mpg,weight,origin);
partition data
partition the data into training and test sets. use approximately 80% of the observations for the model selection and hyperparameter tuning process, and 20% of the observations to test the performance of the final model returned by fitcauto
. use cvpartition
to partition the data.
rng("default") % for reproducibility of the data partition c = cvpartition(origin,"holdout",0.2); trainingidx = training(c); % training set indices carstrain = cars(trainingidx,:); testidx = test(c); % test set indices carstest = cars(testidx,:);
run fitcauto
pass the training data to fitcauto
. by default, fitcauto
determines appropriate model types to try, uses bayesian optimization to find good hyperparameter values, and returns a trained model mdl
with the best expected performance. additionally, fitcauto
provides a plot of the optimization and an iterative display of the optimization results. for more information on how to interpret these results, see verbose display.
expect this process to take some time. to speed up the optimization process, consider specifying to run the optimization in parallel, if you have a parallel computing toolbox™ license. to do so, pass "hyperparameteroptimizationoptions",struct("useparallel",true)
to fitcauto
as a name-value argument.
mdl = fitcauto(carstrain,"origin");
warning: it is recommended that you first standardize all numeric predictors when optimizing the naive bayes 'width' parameter. ignore this warning if you have done that.
learner types to explore: ensemble, knn, nb, net, svm, tree total iterations (maxobjectiveevaluations): 180 total time (maxtime): inf |=============================================================================================================================================| | iter | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | result | loss | & validation (sec)| validation loss | validation loss | | | |=============================================================================================================================================| | 1 | best | 0.37179 | 0.62903 | 0.37179 | 0.37179 | svm | boxconstraint: 0.11704 | | | | | | | | | kernelscale: 0.004903 | | 2 | best | 0.22769 | 0.42586 | 0.22769 | 0.22769 | nb | distributionnames: normal | | | | | | | | | width: nan | | 3 | best | 0.19231 | 0.42729 | 0.19231 | 0.19231 | knn | numneighbors: 3 | | 4 | accept | 0.22769 | 0.1005 | 0.19231 | 0.19231 | nb | distributionnames: normal | | | | | | | | | width: nan | | 5 | best | 0.1891 | 0.13361 | 0.1891 | 0.19096 | knn | numneighbors: 12 | | 6 | best | 0.10154 | 0.28324 | 0.10154 | 0.10154 | tree | minleafsize: 5 | | 7 | accept | 0.16026 | 7.2743 | 0.10154 | 0.10154 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.025856 | | | | | | | | | layersizes: [ 286 51 3 ] | | 8 | accept | 0.37179 | 0.096727 | 0.10154 | 0.10154 | svm | boxconstraint: 1.2607 | | | | | | | | | kernelscale: 97.75 | | 9 | accept | 0.37179 | 0.25096 | 0.10154 | 0.10154 | net | activations: relu | | | | | | | | | standardize: true | | | | | | | | | lambda: 211.47 | | | | | | | | | layersizes: [ 102 222 ] | | 10 | accept | 0.19231 | 0.070249 | 0.10154 | 0.10154 | knn | numneighbors: 15 | | 11 | accept | 0.22769 | 0.067816 | 0.10154 | 0.10154 | nb | distributionnames: normal | | | | | | | | | width: nan | | 12 | accept | 0.15077 | 9.2891 | 0.10154 | 0.10154 | ensemble | method: bag | | | | | | | | | numlearningcycles: 249 | | | | | | | | | minleafsize: 25 | | 13 | accept | 0.22769 | 0.063862 | 0.10154 | 0.10154 | nb | distributionnames: normal | | | | | | | | | width: nan | | 14 | accept | 0.37179 | 0.092315 | 0.10154 | 0.10154 | svm | boxconstraint: 9.3148 | | | | | | | | | kernelscale: 0.0017736 | | 15 | accept | 0.24615 | 0.64539 | 0.10154 | 0.10154 | nb | distributionnames: kernel | | | | | | | | | width: 1.2125 | | 16 | accept | 0.12615 | 0.074982 | 0.10154 | 0.11409 | tree | minleafsize: 7 | | 17 | accept | 0.16308 | 9.4331 | 0.10154 | 0.11409 | ensemble | method: bag | | | | | | | | | numlearningcycles: 284 | | | | | | | | | minleafsize: 89 | | 18 | accept | 0.16923 | 0.062929 | 0.10154 | 0.13272 | tree | minleafsize: 81 | | 19 | accept | 0.37179 | 0.096521 | 0.10154 | 0.13272 | svm | boxconstraint: 1.6219 | | | | | | | | | kernelscale: 0.0011185 | | 20 | accept | 0.25321 | 0.069954 | 0.10154 | 0.13272 | knn | numneighbors: 124 | |=============================================================================================================================================| | iter | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | result | loss | & validation (sec)| validation loss | validation loss | | | |=============================================================================================================================================| | 21 | accept | 0.37179 | 0.081257 | 0.10154 | 0.13272 | svm | boxconstraint: 0.0011787 | | | | | | | | | kernelscale: 1.1427 | | 22 | accept | 0.22769 | 0.062348 | 0.10154 | 0.13272 | nb | distributionnames: normal | | | | | | | | | width: nan | | 23 | accept | 0.13846 | 9.7413 | 0.10154 | 0.13272 | ensemble | method: bag | | | | | | | | | numlearningcycles: 279 | | | | | | | | | minleafsize: 2 | | 24 | accept | 0.25231 | 0.19127 | 0.10154 | 0.13272 | nb | distributionnames: kernel | | | | | | | | | width: 1.6084 | | 25 | accept | 0.22769 | 0.062537 | 0.10154 | 0.13272 | nb | distributionnames: normal | | | | | | | | | width: nan | | 26 | accept | 0.19872 | 0.079713 | 0.10154 | 0.13272 | knn | numneighbors: 1 | | 27 | accept | 0.23397 | 3.666 | 0.10154 | 0.13272 | net | activations: tanh | | | | | | | | | standardize: false | | | | | | | | | lambda: 1.1283e-06 | | | | | | | | | layersizes: [ 102 1 ] | | 28 | accept | 0.13538 | 0.070656 | 0.10154 | 0.1338 | tree | minleafsize: 19 | | 29 | accept | 0.19551 | 0.064085 | 0.10154 | 0.1338 | knn | numneighbors: 26 | | 30 | accept | 0.37179 | 0.09049 | 0.10154 | 0.1338 | svm | boxconstraint: 3.391 | | | | | | | | | kernelscale: 0.021864 | | 31 | accept | 0.1891 | 2.712 | 0.10154 | 0.1338 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 1.0513e-06 | | | | | | | | | layersizes: [ 2 2 ] | | 32 | accept | 0.21154 | 0.063578 | 0.10154 | 0.1338 | knn | numneighbors: 2 | | 33 | accept | 0.10154 | 0.070667 | 0.10154 | 0.12751 | tree | minleafsize: 5 | | 34 | accept | 0.37179 | 0.096094 | 0.10154 | 0.12751 | svm | boxconstraint: 469.1 | | | | | | | | | kernelscale: 0.0089806 | | 35 | accept | 0.14462 | 8.4121 | 0.10154 | 0.12751 | ensemble | method: bag | | | | | | | | | numlearningcycles: 241 | | | | | | | | | minleafsize: 1 | | 36 | accept | 0.11385 | 0.066814 | 0.10154 | 0.11727 | tree | minleafsize: 11 | | 37 | best | 0.098462 | 5.7806 | 0.098462 | 0.11727 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 218 | | | | | | | | | minleafsize: 48 | | 38 | accept | 0.22769 | 0.061168 | 0.098462 | 0.11727 | nb | distributionnames: normal | | | | | | | | | width: nan | | 39 | accept | 0.37179 | 0.44849 | 0.098462 | 0.11727 | net | activations: tanh | | | | | | | | | standardize: false | | | | | | | | | lambda: 29.705 | | | | | | | | | layersizes: 118 | | 40 | accept | 0.24923 | 0.19982 | 0.098462 | 0.11727 | nb | distributionnames: kernel | | | | | | | | | width: 3.9774 | |=============================================================================================================================================| | iter | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | result | loss | & validation (sec)| validation loss | validation loss | | | |=============================================================================================================================================| | 41 | accept | 0.18769 | 0.063134 | 0.098462 | 0.11494 | tree | minleafsize: 112 | | 42 | accept | 0.10769 | 5.1672 | 0.098462 | 0.11494 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 213 | | | | | | | | | minleafsize: 6 | | 43 | accept | 0.17628 | 0.064608 | 0.098462 | 0.11494 | knn | numneighbors: 41 | | 44 | accept | 0.37231 | 0.065944 | 0.098462 | 0.11788 | tree | minleafsize: 152 | | 45 | accept | 0.22769 | 0.055901 | 0.098462 | 0.11788 | nb | distributionnames: normal | | | | | | | | | width: nan | | 46 | accept | 0.37179 | 0.070447 | 0.098462 | 0.11788 | svm | boxconstraint: 0.017639 | | | | | | | | | kernelscale: 1.8123 | | 47 | accept | 0.37179 | 0.22159 | 0.098462 | 0.11788 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 2.6201 | | | | | | | | | layersizes: [ 134 10 240 ] | | 48 | accept | 0.37179 | 0.16389 | 0.098462 | 0.11788 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.12107 | | | | | | | | | layersizes: [ 253 1 ] | | 49 | accept | 0.13141 | 0.077293 | 0.098462 | 0.11788 | svm | boxconstraint: 31.426 | | | | | | | | | kernelscale: 1.6379 | | 50 | accept | 0.22769 | 0.055984 | 0.098462 | 0.11788 | nb | distributionnames: normal | | | | | | | | | width: nan | | 51 | accept | 0.14769 | 9.3995 | 0.098462 | 0.11788 | ensemble | method: bag | | | | | | | | | numlearningcycles: 272 | | | | | | | | | minleafsize: 1 | | 52 | accept | 0.12923 | 0.063907 | 0.098462 | 0.11796 | tree | minleafsize: 3 | | 53 | accept | 0.37179 | 0.10353 | 0.098462 | 0.11796 | svm | boxconstraint: 20.907 | | | | | | | | | kernelscale: 0.0030163 | | 54 | accept | 0.15385 | 8.3016 | 0.098462 | 0.11796 | ensemble | method: bag | | | | | | | | | numlearningcycles: 243 | | | | | | | | | minleafsize: 18 | | 55 | accept | 0.17628 | 0.061456 | 0.098462 | 0.11796 | knn | numneighbors: 41 | | 56 | accept | 0.16667 | 0.7857 | 0.098462 | 0.11796 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.00041104 | | | | | | | | | layersizes: 2 | | 57 | accept | 0.22769 | 0.058204 | 0.098462 | 0.11796 | nb | distributionnames: normal | | | | | | | | | width: nan | | 58 | accept | 0.17949 | 4.0585 | 0.098462 | 0.11796 | net | activations: sigmoid | | | | | | | | | standardize: false | | | | | | | | | lambda: 3.7419e-06 | | | | | | | | | layersizes: [ 8 37 ] | | 59 | accept | 0.23385 | 0.22143 | 0.098462 | 0.11796 | nb | distributionnames: kernel | | | | | | | | | width: 561.16 | | 60 | accept | 0.19551 | 0.06451 | 0.098462 | 0.11796 | knn | numneighbors: 26 | |=============================================================================================================================================| | iter | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | result | loss | & validation (sec)| validation loss | validation loss | | | |=============================================================================================================================================| | 61 | accept | 0.37231 | 6.2084 | 0.098462 | 0.11796 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 272 | | | | | | | | | minleafsize: 161 | | 62 | accept | 0.37179 | 0.090047 | 0.098462 | 0.11796 | svm | boxconstraint: 29.741 | | | | | | | | | kernelscale: 0.045927 | | 63 | accept | 0.20513 | 6.8138 | 0.098462 | 0.11796 | net | activations: sigmoid | | | | | | | | | standardize: false | | | | | | | | | lambda: 5.1269e-08 | | | | | | | | | layersizes: [ 175 2 4 ] | | 64 | accept | 0.1891 | 0.05843 | 0.098462 | 0.11796 | knn | numneighbors: 9 | | 65 | accept | 0.19231 | 0.067392 | 0.098462 | 0.11796 | knn | numneighbors: 15 | | 66 | accept | 0.12923 | 0.075809 | 0.098462 | 0.11512 | tree | minleafsize: 3 | | 67 | accept | 0.37179 | 0.073588 | 0.098462 | 0.11512 | svm | boxconstraint: 0.018753 | | | | | | | | | kernelscale: 0.38262 | | 68 | accept | 0.17308 | 0.06212 | 0.098462 | 0.11512 | knn | numneighbors: 4 | | 69 | accept | 0.14769 | 0.074561 | 0.098462 | 0.11688 | tree | minleafsize: 2 | | 70 | accept | 0.13538 | 7.9953 | 0.098462 | 0.11674 | ensemble | method: bag | | | | | | | | | numlearningcycles: 222 | | | | | | | | | minleafsize: 1 | | 71 | accept | 0.13538 | 0.062477 | 0.098462 | 0.11674 | tree | minleafsize: 19 | | 72 | accept | 0.12923 | 0.063529 | 0.098462 | 0.11674 | tree | minleafsize: 8 | | 73 | accept | 0.15064 | 1.8048 | 0.098462 | 0.11674 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.00059411 | | | | | | | | | layersizes: [ 5 4 ] | | 74 | accept | 0.22769 | 0.058621 | 0.098462 | 0.11674 | nb | distributionnames: normal | | | | | | | | | width: nan | | 75 | accept | 0.16987 | 5.3533 | 0.098462 | 0.11674 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.00022289 | | | | | | | | | layersizes: [ 81 9 ] | | 76 | accept | 0.14154 | 7.4446 | 0.098462 | 0.11813 | ensemble | method: bag | | | | | | | | | numlearningcycles: 214 | | | | | | | | | minleafsize: 7 | | 77 | accept | 0.33846 | 0.059384 | 0.098462 | 0.1167 | tree | minleafsize: 130 | | 78 | accept | 0.13231 | 4.9246 | 0.098462 | 0.1167 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 212 | | | | | | | | | minleafsize: 64 | | 79 | accept | 0.37179 | 0.12859 | 0.098462 | 0.1167 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 5.8307 | | | | | | | | | layersizes: [ 20 1 ] | | 80 | accept | 0.14154 | 7.5942 | 0.098462 | 0.1167 | ensemble | method: bag | | | | | | | | | numlearningcycles: 219 | | | | | | | | | minleafsize: 7 | |=============================================================================================================================================| | iter | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | result | loss | & validation (sec)| validation loss | validation loss | | | |=============================================================================================================================================| | 81 | accept | 0.19872 | 0.063107 | 0.098462 | 0.1167 | knn | numneighbors: 1 | | 82 | accept | 0.37179 | 0.14405 | 0.098462 | 0.1167 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 15.587 | | | | | | | | | layersizes: [ 182 4 ] | | 83 | accept | 0.37179 | 0.3951 | 0.098462 | 0.1167 | net | activations: none | | | | | | | | | standardize: false | | | | | | | | | lambda: 0.00026401 | | | | | | | | | layersizes: [ 1 79 ] | | 84 | accept | 0.14154 | 6.1198 | 0.098462 | 0.1167 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 266 | | | | | | | | | minleafsize: 110 | | 85 | accept | 0.14423 | 0.10373 | 0.098462 | 0.1167 | svm | boxconstraint: 322.43 | | | | | | | | | kernelscale: 1.7393 | | 86 | accept | 0.12 | 0.073943 | 0.098462 | 0.11425 | tree | minleafsize: 4 | | 87 | accept | 0.37179 | 0.089384 | 0.098462 | 0.11425 | svm | boxconstraint: 0.0026322 | | | | | | | | | kernelscale: 0.004006 | | 88 | accept | 0.14154 | 9.5099 | 0.098462 | 0.11425 | ensemble | method: bag | | | | | | | | | numlearningcycles: 276 | | | | | | | | | minleafsize: 2 | | 89 | accept | 0.37179 | 0.088251 | 0.098462 | 0.11425 | svm | boxconstraint: 25.201 | | | | | | | | | kernelscale: 0.019423 | | 90 | accept | 0.1891 | 0.072839 | 0.098462 | 0.11425 | knn | numneighbors: 13 | | 91 | accept | 0.12615 | 4.8775 | 0.098462 | 0.11425 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 211 | | | | | | | | | minleafsize: 75 | | 92 | accept | 0.14154 | 4.9062 | 0.098462 | 0.11425 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 213 | | | | | | | | | minleafsize: 96 | | 93 | accept | 0.11077 | 6.4978 | 0.098462 | 0.11425 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 277 | | | | | | | | | minleafsize: 4 | | 94 | accept | 0.10154 | 6.4944 | 0.098462 | 0.11048 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 274 | | | | | | | | | minleafsize: 16 | | 95 | accept | 0.12615 | 6.6536 | 0.098462 | 0.11217 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 289 | | | | | | | | | minleafsize: 53 | | 96 | accept | 0.14462 | 4.5605 | 0.098462 | 0.1093 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 201 | | | | | | | | | minleafsize: 91 | | 97 | best | 0.089231 | 5.7135 | 0.089231 | 0.10371 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 242 | | | | | | | | | minleafsize: 13 | | 98 | accept | 0.10769 | 5.5323 | 0.089231 | 0.10353 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 233 | | | | | | | | | minleafsize: 10 | | 99 | accept | 0.12 | 5.9044 | 0.089231 | 0.10351 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 253 | | | | | | | | | minleafsize: 26 | | 100 | accept | 0.10154 | 6.4575 | 0.089231 | 0.10122 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 272 | | | | | | | | | minleafsize: 13 | |=============================================================================================================================================| | iter | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | result | loss | & validation (sec)| validation loss | validation loss | | | |=============================================================================================================================================| | 101 | accept | 0.16026 | 7.4685 | 0.089231 | 0.10122 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.020619 | | | | | | | | | layersizes: [ 117 160 2 ] | | 102 | accept | 0.20513 | 3.5845 | 0.089231 | 0.10122 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.052211 | | | | | | | | | layersizes: [ 18 182 163 ] | | 103 | best | 0.086154 | 4.9401 | 0.086154 | 0.095252 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 208 | | | | | | | | | minleafsize: 15 | | 104 | accept | 0.095385 | 6.4925 | 0.086154 | 0.096118 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 274 | | | | | | | | | minleafsize: 14 | | 105 | accept | 0.092308 | 4.8125 | 0.086154 | 0.093255 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 201 | | | | | | | | | minleafsize: 14 | | 106 | accept | 0.37231 | 4.7781 | 0.086154 | 0.092615 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 207 | | | | | | | | | minleafsize: 134 | | 107 | accept | 0.14462 | 4.7742 | 0.086154 | 0.097454 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 206 | | | | | | | | | minleafsize: 93 | | 108 | accept | 0.092308 | 4.9755 | 0.086154 | 0.092405 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 206 | | | | | | | | | minleafsize: 14 | | 109 | accept | 0.092308 | 4.9658 | 0.086154 | 0.091949 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 208 | | | | | | | | | minleafsize: 20 | | 110 | accept | 0.10154 | 5.0332 | 0.086154 | 0.092013 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 207 | | | | | | | | | minleafsize: 16 | | 111 | accept | 0.17846 | 4.6286 | 0.086154 | 0.092219 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 201 | | | | | | | | | minleafsize: 120 | | 112 | accept | 0.18462 | 5.3291 | 0.086154 | 0.092663 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 234 | | | | | | | | | minleafsize: 114 | | 113 | accept | 0.10154 | 4.9436 | 0.086154 | 0.091972 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 210 | | | | | | | | | minleafsize: 21 | | 114 | accept | 0.10154 | 4.8767 | 0.086154 | 0.092793 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 208 | | | | | | | | | minleafsize: 23 | | 115 | accept | 0.10769 | 4.8384 | 0.086154 | 0.09189 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 203 | | | | | | | | | minleafsize: 27 | | 116 | accept | 0.089231 | 5.1892 | 0.086154 | 0.091881 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 216 | | | | | | | | | minleafsize: 18 | | 117 | accept | 0.095385 | 5.2157 | 0.086154 | 0.092387 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 220 | | | | | | | | | minleafsize: 15 | | 118 | accept | 0.095385 | 5.0879 | 0.086154 | 0.092544 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 213 | | | | | | | | | minleafsize: 15 | | 119 | accept | 0.10154 | 5.6188 | 0.086154 | 0.092332 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 235 | | | | | | | | | minleafsize: 17 | | 120 | accept | 0.37179 | 0.197 | 0.086154 | 0.092332 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.063953 | | | | | | | | | layersizes: [ 220 3 ] | |=============================================================================================================================================| | iter | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | result | loss | & validation (sec)| validation loss | validation loss | | | |=============================================================================================================================================| | 121 | accept | 0.37179 | 0.1961 | 0.086154 | 0.092332 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.055033 | | | | | | | | | layersizes: [ 97 13 ] | | 122 | accept | 0.14103 | 1.4229 | 0.086154 | 0.092332 | net | activations: none | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.00085152 | | | | | | | | | layersizes: [ 197 20 2 ] | | 123 | accept | 0.37179 | 0.22892 | 0.086154 | 0.092332 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.051445 | | | | | | | | | layersizes: [ 247 6 ] | | 124 | accept | 0.13782 | 0.38245 | 0.086154 | 0.092332 | net | activations: none | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.0087893 | | | | | | | | | layersizes: [ 199 2 ] | | 125 | accept | 0.26282 | 1.0954 | 0.086154 | 0.092332 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.016624 | | | | | | | | | layersizes: [ 115 10 ] | | 126 | accept | 0.18269 | 0.24182 | 0.086154 | 0.092332 | net | activations: none | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.035571 | | | | | | | | | layersizes: [ 224 9 ] | | 127 | accept | 0.095385 | 5.0171 | 0.086154 | 0.091895 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 208 | | | | | | | | | minleafsize: 2 | | 128 | accept | 0.37179 | 0.13063 | 0.086154 | 0.091895 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.0424 | | | | | | | | | layersizes: [ 4 2 3 ] | | 129 | accept | 0.14103 | 2.0229 | 0.086154 | 0.091895 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.0028328 | | | | | | | | | layersizes: [ 1 8 1 ] | | 130 | accept | 0.18269 | 1.0404 | 0.086154 | 0.091895 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.037003 | | | | | | | | | layersizes: [ 153 3 ] | | 131 | accept | 0.14423 | 2.7718 | 0.086154 | 0.091895 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.0058072 | | | | | | | | | layersizes: [ 1 72 ] | | 132 | accept | 0.14103 | 9.7891 | 0.086154 | 0.091895 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.0040232 | | | | | | | | | layersizes: [ 1 88 57 ] | | 133 | accept | 0.14103 | 0.44563 | 0.086154 | 0.091895 | net | activations: none | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.0020791 | | | | | | | | | layersizes: [ 3 5 2 ] | | 134 | accept | 0.37179 | 0.13512 | 0.086154 | 0.091895 | net | activations: relu | | | | | | | | | standardize: false | | | | | | | | | lambda: 0.027533 | | | | | | | | | layersizes: [ 2 2 1 ] | | 135 | accept | 0.18269 | 1.6723 | 0.086154 | 0.091895 | net | activations: relu | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.0042357 | | | | | | | | | layersizes: [ 5 3 1 ] | | 136 | accept | 0.23397 | 3.4642 | 0.086154 | 0.091895 | net | activations: tanh | | | | | | | | | standardize: false | | | | | | | | | lambda: 0.00033175 | | | | | | | | | layersizes: [ 1 26 1 ] | | 137 | accept | 0.21154 | 3.0298 | 0.086154 | 0.091895 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.030788 | | | | | | | | | layersizes: [ 1 130 48 ] | | 138 | accept | 0.086154 | 5.109 | 0.086154 | 0.091384 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 210 | | | | | | | | | minleafsize: 15 | | 139 | accept | 0.14103 | 5.9512 | 0.086154 | 0.091384 | net | activations: tanh | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.0029983 | | | | | | | | | layersizes: [ 1 11 104 ] | | 140 | accept | 0.14744 | 3.0068 | 0.086154 | 0.091384 | net | activations: relu | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.007429 | | | | | | | | | layersizes: [ 1 102 25 ] | |=============================================================================================================================================| | iter | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | result | loss | & validation (sec)| validation loss | validation loss | | | |=============================================================================================================================================| | 141 | accept | 0.22436 | 2.6641 | 0.086154 | 0.091384 | net | activations: none | | | | | | | | | standardize: false | | | | | | | | | lambda: 0.001718 | | | | | | | | | layersizes: [ 3 7 125 ] | | 142 | accept | 0.089231 | 5.0047 | 0.086154 | 0.089265 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 209 | | | | | | | | | minleafsize: 15 | | 143 | accept | 0.14154 | 5.1754 | 0.086154 | 0.0895 | ensemble | method: logitboost | | | | | | | | | numlearningcycles: 213 | | | | | | | | | minleafsize: 100 | | 144 | accept | 0.37179 | 0.1594 | 0.086154 | 0.0895 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.011755 | | | | | | | | | layersizes: [ 5 1 2 ] | | 145 | accept | 0.37179 | 0.37807 | 0.086154 | 0.0895 | net | activations: sigmoid | | | | | | | | | standardize: true | | | | | | | | | lambda: 0.0033755 | | | | | | | | | lay...
__________________________________________________________ optimization completed. total iterations: 180 total elapsed time: 699.614 seconds total time for training and validation: 493.3351 seconds best observed learner is an ensemble model with: learner: ensemble method: logitboost numlearningcycles: 208 minleafsize: 15 observed validation loss: 0.086154 time for training and validation: 4.9401 seconds best estimated learner (returned model) is an ensemble model with: learner: ensemble method: logitboost numlearningcycles: 209 minleafsize: 15 estimated validation loss: 0.089192 estimated time for training and validation: 5.0288 seconds documentation for fitcauto display
the final model returned by fitcauto
corresponds to the best estimated learner. before returning the model, the function retrains it using the entire training data (carstrain
), the listed learner
(or model) type, and the displayed hyperparameter values.
evaluate test set performance
evaluate the performance of the model on the test set.
testaccuracy = 1 - loss(mdl,carstest,"origin")
testaccuracy = 0.9263
confusionchart(carstest.origin,predict(mdl,carstest))
automatically select classifier using matrix data
this example uses:
use fitcauto
to automatically select a classification model with optimized hyperparameters, given predictor and response data stored in separate variables.
load data
load the humanactivity
data set. this data set contains 24,075 observations of five physical human activities: sitting (1), standing (2), walking (3), running (4), and dancing (5). each observation has 60 features extracted from acceleration data measured by smartphone accelerometer sensors. the variable feat
contains the predictor data matrix of the 60 features for the 24,075 observations, and the response variable actid
contains the activity ids for the observations as integers.
load humanactivity
partition data
partition the data into training and test sets. use 90% of the observations to select a model, and 10% of the observations to validate the final model returned by fitcauto
. use cvpartition
to reserve 10% of the observations for testing.
rng("default") % for reproducibility of the partition c = cvpartition(actid,"holdout",0.10); trainingindices = training(c); % indices for the training set xtrain = feat(trainingindices,:); ytrain = actid(trainingindices); testindices = test(c); % indices for the test set xtest = feat(testindices,:); ytest = actid(testindices);
run fitcauto
pass the training data to fitcauto
. because the training data xtrain
has more than 10,000 observations, use asha optimization rather than bayesian optimization. the fitcauto
function randomly selects appropriate model (or learner) types with different hyperparameter values, trains the models on a small subset of the training data, promotes the models that perform well, and retrains the promoted models on progressively larger sets of training data. the function returns the model with the best cross-validation performance, retrained on all the training data, and a table that contains the details of the optimization. specify to run the optimization in parallel (requires parallel computing toolbox™).
by default, fitcauto
provides a plot of the optimization and an iterative display of the optimization results. for more information on how to interpret these results, see verbose display.
options = struct("optimizer","asha","useparallel",true); [mdl,optimizationresults] = fitcauto(xtrain,ytrain,"hyperparameteroptimizationoptions",options);
warning: it is recommended that you first standardize all numeric predictors when optimizing the naive bayes 'width' parameter. ignore this warning if you have done that.
starting parallel pool (parpool) using the 'local' profile ... connected to the parallel pool (number of workers: 8). copying objective function to workers... done copying objective function to workers. learner types to explore: ensemble, knn, nb, net, svm, tree total iterations (maxobjectiveevaluations): 595 total time (maxtime): inf |====================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | training set | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |====================================================================================================================================================| | 1 | 8 | best | 0.74165 | 2.2322 | 0.74165 | 271 | tree | minleafsize: 945 | | 2 | 7 | accept | 0.74165 | 9.0692 | 0.049289 | 271 | knn | numneighbors: 1726 | | 3 | 7 | best | 0.049289 | 3.3616 | 0.049289 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 4 | 7 | accept | 0.74165 | 9.2877 | 0.049289 | 271 | knn | numneighbors: 3072 | | 5 | 8 | best | 0.046566 | 0.81486 | 0.046566 | 1084 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 6 | 8 | accept | 0.13379 | 3.0947 | 0.046566 | 271 | knn | numneighbors: 46 | | 7 | 8 | accept | 0.066457 | 13.692 | 0.046566 | 271 | net | activations: sigmoid | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 1.927e-08 | | | | | | | | | | layersizes: [ 10 56 28 ] | | 8 | 8 | accept | 0.096225 | 3.801 | 0.046566 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 8.8825 | | | | | | | | | | kernelscale: 73.89 | | 9 | 7 | accept | 0.73962 | 14.971 | 0.046566 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 17.562 | | | | | | | | | | kernelscale: 0.0082394 | | 10 | 7 | accept | 0.73925 | 14.98 | 0.046566 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 1.3419 | | | | | | | | | | kernelscale: 0.027033 | | 11 | 8 | accept | 0.74165 | 5.7328 | 0.046566 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.040932 | | | | | | | | | | kernelscale: 949.09 | | 12 | 8 | best | 0.04472 | 3.1496 | 0.04472 | 271 | knn | numneighbors: 2 | | 13 | 8 | accept | 0.74165 | 6.4341 | 0.04472 | 271 | knn | numneighbors: 1240 | | 14 | 8 | best | 0.041536 | 2.2078 | 0.041536 | 271 | knn | numneighbors: 3 | | 15 | 8 | accept | 0.051828 | 2.211 | 0.041536 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 16 | 8 | accept | 0.74165 | 12.471 | 0.041536 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.0079065 | | | | | | | | | | kernelscale: 0.039442 | | 17 | 8 | accept | 0.74165 | 0.88796 | 0.041536 | 271 | tree | minleafsize: 252 | | 18 | 8 | best | 0.029629 | 6.9518 | 0.029629 | 1084 | knn | numneighbors: 2 | | 19 | 8 | accept | 0.030044 | 6.2852 | 0.029629 | 1084 | knn | numneighbors: 3 | | 20 | 8 | accept | 0.74165 | 6.2132 | 0.029629 | 271 | knn | numneighbors: 8117 | |====================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | training set | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |====================================================================================================================================================| | 21 | 8 | accept | 0.1811 | 2.2815 | 0.029629 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.0052814 | | | | | | | | | | kernelscale: 546.04 | | 22 | 8 | accept | 0.34996 | 4.8703 | 0.029629 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 1.0486 | | | | | | | | | | kernelscale: 2.323 | | 23 | 8 | accept | 0.74165 | 5.7248 | 0.029629 | 271 | knn | numneighbors: 524 | | 24 | 8 | accept | 0.046243 | 1.9627 | 0.029629 | 1084 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 25 | 7 | accept | 0.73057 | 53.598 | 0.029629 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 1.0633e-13 | | 26 | 7 | accept | 0.04209 | 38.18 | 0.029629 | 1084 | net | activations: sigmoid | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 1.927e-08 | | | | | | | | | | layersizes: [ 10 56 28 ] | | 27 | 8 | accept | 0.035998 | 31.431 | 0.029629 | 271 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 1.4417e-06 | | | | | | | | | | layersizes: [ 202 11 ] | | 28 | 8 | accept | 0.082103 | 2.5818 | 0.029629 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 1.0839 | | | | | | | | | | kernelscale: 39.504 | | 29 | 8 | accept | 0.032721 | 20.812 | 0.029629 | 271 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 257 | | | | | | | | | | minleafsize: 4 | | | | | | | | | | maxnumsplits: 13 | | 30 | 8 | accept | 0.74165 | 7.0246 | 0.029629 | 271 | knn | numneighbors: 384 | | 31 | 8 | accept | 0.73034 | 49.099 | 0.029629 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 2.0942e-12 | | 32 | 8 | accept | 0.052843 | 1.4947 | 0.029629 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 33 | 8 | best | 0.019522 | 19.771 | 0.019522 | 4334 | knn | numneighbors: 2 | | 34 | 8 | accept | 0.74072 | 10.11 | 0.019522 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.44626 | | | | | | | | | | kernelscale: 0.089894 | | 35 | 8 | accept | 0.10218 | 4.6984 | 0.019522 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 2.8638 | | | | | | | | | | kernelscale: 201.68 | | 36 | 8 | accept | 0.71774 | 68.103 | 0.019522 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 0.00032006 | | 37 | 8 | accept | 0.6913 | 5.0575 | 0.019522 | 271 | knn | numneighbors: 184 | | 38 | 8 | accept | 0.10689 | 2.2859 | 0.019522 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.036995 | | | | | | | | | | kernelscale: 13.878 | | 39 | 8 | accept | 0.72983 | 56.397 | 0.019522 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 2.3638e-05 | | 40 | 8 | accept | 0.035121 | 5.8369 | 0.019522 | 271 | net | activations: tanh | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 2.7559e-07 | | | | | | | | | | layersizes: [ 32 93 ] | |====================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | training set | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |====================================================================================================================================================| | 41 | 8 | accept | 0.048459 | 0.72565 | 0.019522 | 1084 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 42 | 8 | accept | 0.045136 | 0.89703 | 0.019522 | 271 | tree | minleafsize: 6 | | 43 | 8 | accept | 0.11427 | 20.922 | 0.019522 | 271 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 240 | | | | | | | | | | minleafsize: 49 | | | | | | | | | | maxnumsplits: 97 | | 44 | 8 | accept | 0.041674 | 2.8829 | 0.019522 | 271 | knn | numneighbors: 2 | | 45 | 8 | accept | 0.74165 | 5.746 | 0.019522 | 271 | knn | numneighbors: 4410 | | 46 | 8 | accept | 0.019799 | 3.4571 | 0.019522 | 1084 | net | activations: tanh | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 2.7559e-07 | | | | | | | | | | layersizes: [ 32 93 ] | | 47 | 8 | accept | 0.030414 | 23.366 | 0.019522 | 1084 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 257 | | | | | | | | | | minleafsize: 4 | | | | | | | | | | maxnumsplits: 13 | | 48 | 8 | accept | 0.049243 | 2.1016 | 0.019522 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 49 | 8 | accept | 0.041767 | 0.83736 | 0.019522 | 271 | tree | minleafsize: 3 | | 50 | 8 | accept | 0.032813 | 4.5548 | 0.019522 | 271 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 223 | | | | | | | | | | minleafsize: 1 | | | | | | | | | | maxnumsplits: 75 | | 51 | 8 | accept | 0.7362 | 11.413 | 0.019522 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 760.57 | | | | | | | | | | kernelscale: 0.34067 | | 52 | 8 | accept | 0.055843 | 0.7704 | 0.019522 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 53 | 8 | accept | 0.028937 | 6.754 | 0.019522 | 1084 | knn | numneighbors: 2 | | 54 | 8 | accept | 0.74165 | 5.6993 | 0.019522 | 271 | knn | numneighbors: 9124 | | 55 | 8 | accept | 0.054689 | 0.75579 | 0.019522 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 56 | 8 | accept | 0.019799 | 8.4671 | 0.019522 | 1084 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 223 | | | | | | | | | | minleafsize: 1 | | | | | | | | | | maxnumsplits: 75 | | 57 | 8 | best | 0.010338 | 22.501 | 0.010338 | 4334 | net | activations: tanh | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 2.7559e-07 | | | | | | | | | | layersizes: [ 32 93 ] | | 58 | 8 | accept | 0.12599 | 2.4672 | 0.010338 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.41895 | | | | | | | | | | kernelscale: 96.491 | | 59 | 8 | accept | 0.70357 | 60.43 | 0.010338 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 0.00045097 | | 60 | 8 | accept | 0.5737 | 73.531 | 0.010338 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 0.0018132 | |====================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | training set | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |====================================================================================================================================================| | 61 | 8 | accept | 0.74165 | 2.1874 | 0.010338 | 271 | net | activations: sigmoid | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 0.019205 | | | | | | | | | | layersizes: [ 3 3 ] | | 62 | 8 | accept | 0.050212 | 1.0089 | 0.010338 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 63 | 8 | accept | 0.027506 | 77.499 | 0.010338 | 1084 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 1.4417e-06 | | | | | | | | | | layersizes: [ 202 11 ] | | 64 | 8 | accept | 0.030137 | 2.1967 | 0.010338 | 1084 | tree | minleafsize: 3 | | 65 | 8 | accept | 0.74165 | 6.9907 | 0.010338 | 271 | knn | numneighbors: 7694 | | 66 | 8 | accept | 0.73025 | 46.328 | 0.010338 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 7.594e-10 | | 67 | 8 | accept | 0.71742 | 63.01 | 0.010338 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 0.00036683 | | 68 | 8 | accept | 0.029906 | 1.3903 | 0.010338 | 1084 | tree | minleafsize: 6 | | 69 | 8 | accept | 0.090502 | 5.9339 | 0.010338 | 271 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 2.6095e-08 | | | | | | | | | | layersizes: [ 3 16 4 ] | | 70 | 8 | accept | 0.061381 | 2.8978 | 0.010338 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.66386 | | | | | | | | | | kernelscale: 11.66 | | 71 | 8 | accept | 0.037705 | 19.775 | 0.010338 | 271 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 258 | | | | | | | | | | minleafsize: 11 | | | | | | | | | | maxnumsplits: 10 | | 72 | 8 | accept | 0.03849 | 19.504 | 0.010338 | 271 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 6.8643e-05 | | | | | | | | | | layersizes: [ 24 101 14 ] | | 73 | 8 | accept | 0.037152 | 20.796 | 0.010338 | 271 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 275 | | | | | | | | | | minleafsize: 2 | | | | | | | | | | maxnumsplits: 27 | | 74 | 8 | accept | 0.048689 | 2.1741 | 0.010338 | 271 | knn | numneighbors: 4 | | 75 | 8 | accept | 0.55335 | 4.3695 | 0.010338 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.011209 | | | | | | | | | | kernelscale: 1.0514 | | 76 | 8 | accept | 0.032352 | 22.88 | 0.010338 | 1084 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 258 | | | | | | | | | | minleafsize: 11 | | | | | | | | | | maxnumsplits: 10 | | 77 | 8 | best | 0.010061 | 42.264 | 0.010061 | 4334 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 223 | | | | | | | | | | minleafsize: 1 | | | | | | | | | | maxnumsplits: 75 | | 78 | 8 | accept | 0.13125 | 5.0693 | 0.010061 | 271 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 1.5998e-06 | | | | | | | | | | layersizes: [ 1 3 ] | | 79 | 8 | accept | 0.74165 | 14.599 | 0.010061 | 271 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 216 | | | | | | | | | | minleafsize: 421 | | | | | | | | | | maxnumsplits: 59 | | 80 | 8 | accept | 0.73131 | 46.536 | 0.010061 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 1.0852e-11 | |====================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | training set | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |====================================================================================================================================================| | 81 | 8 | accept | 0.026121 | 27.416 | 0.010061 | 1084 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 275 | | | | | | | | | | minleafsize: 2 | | | | | | | | | | maxnumsplits: 27 | | 82 | 8 | accept | 0.07818 | 2.9308 | 0.010061 | 271 | knn | numneighbors: 15 | | 83 | 8 | accept | 0.64501 | 150.1 | 0.010061 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 422.98 | | 84 | 8 | accept | 0.048966 | 0.97418 | 0.010061 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 85 | 8 | accept | 0.74165 | 5.9347 | 0.010061 | 271 | knn | numneighbors: 2792 | | 86 | 8 | accept | 0.050858 | 0.89733 | 0.010061 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 87 | 8 | accept | 0.035444 | 5.8934 | 0.010061 | 1084 | knn | numneighbors: 4 | | 88 | 8 | accept | 0.74165 | 156.96 | 0.010061 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 1679.5 | | 89 | 8 | accept | 0.043982 | 1.8291 | 0.010061 | 271 | tree | minleafsize: 3 | | 90 | 8 | accept | 0.080857 | 144.55 | 0.010061 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 0.87308 | | 91 | 8 | accept | 0.72956 | 47.076 | 0.010061 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 2.7893e-08 | | 92 | 8 | accept | 0.029721 | 1.4922 | 0.010061 | 1084 | tree | minleafsize: 3 | | 93 | 8 | accept | 0.74165 | 0.12917 | 0.010061 | 271 | tree | minleafsize: 7511 | | 94 | 7 | accept | 0.74165 | 6.1437 | 0.010061 | 271 | knn | numneighbors: 9834 | | 95 | 7 | accept | 0.59359 | 73.466 | 0.010061 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 0.0015851 | | 96 | 8 | accept | 0.040336 | 0.69244 | 0.010061 | 271 | tree | minleafsize: 1 | | 97 | 7 | accept | 0.042874 | 2.1452 | 0.010061 | 271 | net | activations: relu | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 3.2315e-08 | | | | | | | | | | layersizes: 3 | | 98 | 7 | accept | 0.74165 | 1.2052 | 0.010061 | 271 | net | activations: tanh | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 0.3822 | | | | | | | | | | layersizes: [ 27 9 ] | | 99 | 8 | accept | 0.026121 | 1.2028 | 0.010061 | 1084 | tree | minleafsize: 1 | | 100 | 8 | accept | 0.74165 | 0.66984 | 0.010061 | 271 | tree | minleafsize: 1551 | |====================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | training set | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |====================================================================================================================================================| | 101 | 8 | accept | 0.74165 | 10.343 | 0.010061 | 271 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 223 | | | | | | | | | | minleafsize: 9461 | | | | | | | | | | maxnumsplits: 11 | | 102 | 8 | accept | 0.051505 | 0.82579 | 0.010061 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 103 | 8 | accept | 0.028244 | 46.019 | 0.010061 | 1084 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 6.8643e-05 | | | | | | | | | | layersizes: [ 24 101 14 ] | | 104 | 8 | accept | 0.05612 | 1.9299 | 0.010061 | 271 | knn | numneighbors: 5 | | 105 | 8 | accept | 0.73006 | 44.245 | 0.010061 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 1.203e-13 | | 106 | 7 | accept | 0.74165 | 6.0627 | 0.010061 | 271 | knn | numneighbors: 323 | | 107 | 7 | accept | 0.041167 | 3.0942 | 0.010061 | 271 | net | activations: tanh | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 7.2382e-05 | | | | | | | | | | layersizes: [ 3 2 ] | | 108 | 8 | accept | 0.66841 | 4.8715 | 0.010061 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 8.4445 | | | | | | | | | | kernelscale: 0.70923 | | 109 | 8 | accept | 0.027321 | 5.5757 | 0.010061 | 1084 | net | activations: relu | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 3.2315e-08 | | | | | | | | | | layersizes: 3 | | 110 | 8 | accept | 0.1343 | 4.1806 | 0.010061 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 2.4202 | | | | | | | | | | kernelscale: 3.4213 | | 111 | 8 | accept | 0.055612 | 0.90595 | 0.010061 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 112 | 8 | accept | 0.019153 | 1.3107 | 0.010061 | 4334 | tree | minleafsize: 1 | | 113 | 8 | accept | 0.048597 | 0.70833 | 0.010061 | 1084 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 114 | 8 | accept | 0.59175 | 0.84241 | 0.010061 | 271 | net | activations: sigmoid | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 1.0815 | | | | | | | | | | layersizes: 150 | | 115 | 8 | accept | 0.54006 | 4.17 | 0.010061 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.029391 | | | | | | | | | | kernelscale: 1.121 | | 116 | 8 | accept | 0.74165 | 6.5172 | 0.010061 | 271 | knn | numneighbors: 2873 | | 117 | 8 | accept | 0.024091 | 7.1168 | 0.010061 | 1084 | net | activations: tanh | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 7.2382e-05 | | | | | | | | | | layersizes: [ 3 2 ] | | 118 | 8 | accept | 0.027368 | 10.135 | 0.010061 | 271 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 216 | | | | | | | | | | minleafsize: 2 | | | | | | | | | | maxnumsplits: 42 | | 119 | 8 | accept | 0.027598 | 46.175 | 0.010061 | 4334 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 275 | | | | | | | | | | minleafsize: 2 | | | | | | | | | | maxnumsplits: 27 | | 120 | 8 | accept | 0.12701 | 3.4389 | 0.010061 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 19.581 | | | | | | | | | | kernelscale: 2.5698 | |====================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | training set | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |====================================================================================================================================================| | 121 | 8 | accept | 0.74165 | 5.7497 | 0.010061 | 271 | knn | numneighbors: 3004 | | 122 | 8 | accept | 0.051551 | 0.98458 | 0.010061 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 123 | 8 | accept | 0.73934 | 10.379 | 0.010061 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 6.66 | | | | | | | | | | kernelscale: 0.0075876 | | 124 | 8 | accept | 0.74165 | 6.0464 | 0.010061 | 271 | knn | numneighbors: 538 | | 125 | 8 | accept | 0.051643 | 0.94588 | 0.010061 | 1084 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 126 | 8 | accept | 0.050074 | 0.66389 | 0.010061 | 271 | tree | minleafsize: 8 | | 127 | 8 | accept | 0.51948 | 11.015 | 0.010061 | 271 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 245 | | | | | | | | | | minleafsize: 103 | | | | | | | | | | maxnumsplits: 30 | | 128 | 8 | accept | 0.054966 | 0.59455 | 0.010061 | 271 | tree | minleafsize: 19 | | 129 | 8 | accept | 0.068765 | 2.1334 | 0.010061 | 271 | knn | numneighbors: 12 | | 130 | 8 | accept | 0.031613 | 1.5598 | 0.010061 | 1084 | tree | minleafsize: 8 | | 131 | 8 | accept | 0.018414 | 20.732 | 0.010061 | 1084 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 216 | | | | | | | | | | minleafsize: 2 | | | | | | | | | | maxnumsplits: 42 | | 132 | 8 | accept | 0.73874 | 9.856 | 0.010061 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.37029 | | | | | | | | | | kernelscale: 0.28264 | | 133 | 8 | accept | 0.044813 | 1.4812 | 0.010061 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 187.62 | | | | | | | | | | kernelscale: 55.462 | | 134 | 8 | accept | 0.74165 | 0.39159 | 0.010061 | 271 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 1.3833 | | | | | | | | | | layersizes: [ 7 3 ] | | 135 | 8 | accept | 0.69134 | 7.5927 | 0.010061 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.90862 | | | | | | | | | | kernelscale: 1.0317 | | 136 | 8 | accept | 0.022568 | 2.2974 | 0.010061 | 1084 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 187.62 | | | | | | | | | | kernelscale: 55.462 | | 137 | 8 | accept | 0.029352 | 19.821 | 0.010061 | 271 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 268 | | | | | | | | | | minleafsize: 1 | | | | | | | | | | maxnumsplits: 79 | | 138 | 8 | accept | 0.74165 | 5.6724 | 0.010061 | 271 | knn | numneighbors: 4444 | | 139 | 8 | accept | 0.056443 | 0.6542 | 0.010061 | 271 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 140 | 8 | accept | 0.74165 | 8.0385 | 0.010061 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.0043989 | | | | | | | | | | kernelscale: 0.36557 | |====================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | training set | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | size | | | |====================================================================================================================================================| | 141 | 8 | accept | 0.73066 | 46.903 | 0.010061 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 1.1477e-12 | | 142 | 8 | accept | 0.015784 | 31.487 | 0.010061 | 4334 | net | activations: tanh | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 7.2382e-05 | | | | | | | | | | layersizes: [ 3 2 ] | | 143 | 8 | accept | 0.028567 | 13.713 | 0.010061 | 271 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 213 | | | | | | | | | | minleafsize: 7 | | | | | | | | | | maxnumsplits: 82 | | 144 | 8 | accept | 0.62622 | 3.8393 | 0.010061 | 271 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.0037069 | | | | | | | | | | kernelscale: 0.81773 | | 145 | 8 | accept | 0.025014 | 27.536 | 0.010061 | 1084 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 268 | | | | | | | | | | minleafsize: 1 | | | | | | | | | | maxnumsplits: 79 | | 146 | 8 | best | 0.0038767 | 74.175 | 0.0038767 | 17335 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 223 | | | | | | | | | | minleafsize: 1 | | | | | | | | | | maxnumsplits: 75 | | 147 | 8 | accept | 0.74165 | 4.4671 | 0.0038767 | 271 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.052342 | | | | | | | | | | kernelscale: 305.83 | | 148 | 8 | accept | 0.082795 | 2.2053 | 0.0038767 | 271 | knn | numneighbors: 20 | | 149 | 8 | accept | 0.57647 | 153.17 | 0.0038767 | 271 | nb | distributionnames: kernel | | | | | | | | | | width: 200.41 | | 150 | 8 | accept | 0.041767 | 7.58 | 0.0038767 | 271 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 0.0095919 | | | | | | | | | | layersizes: [ 89 3 ] | | 151 | 8 | accept | 0.018276 | 18.316 | 0.0038767 | 1084 | ensemble | method: adaboostm2 | | | ...
__________________________________________________________ optimization completed. total iterations: 595 total elapsed time: 1276.4375 seconds total time for training and validation: 9777.0453 seconds best observed learner is an ensemble model with: learner: ensemble method: adaboostm2 numlearningcycles: 223 minleafsize: 1 maxnumsplits: 75 observed validation loss: 0.0038767 time for training and validation: 74.175 seconds documentation for fitcauto display
the final model returned by fitcauto
corresponds to the best observed learner. before returning the model, the function retrains it using all the training data (xtrain
and ytrain
), the listed learner
(or model) type, and the displayed hyperparameter values.
evaluate test set performance
evaluate the final model performance on the test data set.
testaccuracy = 1 - loss(mdl,xtest,ytest)
testaccuracy = 0.9958
the final model correctly classifies over 99% of the observations.
combine feature selection and automated classifier selection
this example uses:
use fitcauto
to automatically select a classification model with optimized hyperparameters, given predictor and response data stored in a table. before passing data to fitcauto
, perform feature selection to remove unimportant predictors from the data set.
load and partition data
read the sample file creditrating_historical.dat
into a table. the predictor data consists of financial ratios and industry sector information for a list of corporate customers. the response variable consists of credit ratings assigned by a rating agency. preview the first few rows of the data set.
creditrating = readtable("creditrating_historical.dat");
head(creditrating)
ans=8×8 table
id wc_ta re_ta ebit_ta mve_bvtd s_ta industry rating
_____ ______ ______ _______ ________ _____ ________ _______
62394 0.013 0.104 0.036 0.447 0.142 3 {'bb' }
48608 0.232 0.335 0.062 1.969 0.281 8 {'a' }
42444 0.311 0.367 0.074 1.935 0.366 1 {'a' }
48631 0.194 0.263 0.062 1.017 0.228 4 {'bbb'}
43768 0.121 0.413 0.057 3.647 0.466 12 {'aaa'}
39255 -0.117 -0.799 0.01 0.179 0.082 4 {'ccc'}
62236 0.087 0.158 0.049 0.816 0.324 2 {'bbb'}
39354 0.005 0.181 0.034 2.597 0.388 7 {'aa' }
because each value in the id
variable is a unique customer id, that is, length(unique(creditrating.id))
is equal to the number of observations in creditrating
, the id
variable is a poor predictor. remove the id
variable from the table, and convert the industry
variable to a categorical
variable.
creditrating = removevars(creditrating,"id");
creditrating.industry = categorical(creditrating.industry);
partition the data into training and test sets. use approximately 85% of the observations for the model selection and hyperparameter tuning process, and 15% of the observations to test the performance of the final model returned by fitcauto
on new data. use cvpartition
to partition the data.
rng("default") % for reproducibility of the partition c = cvpartition(creditrating.rating,"holdout",0.15); trainingindices = training(c); % indices for the training set testindices = test(c); % indices for the test set credittrain = creditrating(trainingindices,:); credittest = creditrating(testindices,:);
perform feature selection
before passing the training data to fitcauto
, find the important predictors by using the fscchi2
function. visualize the predictor scores by using the bar
function. because some scores can be inf
, and bar
discards inf
values, plot the finite scores first and then plot a finite representation of the inf
scores in a different color.
[idx,scores] = fscchi2(credittrain,"rating"); bar(scores(idx)) % represents finite scores hold on veryimportant = isinf(scores); finitemax = max(scores(~veryimportant)); bar(finitemax*veryimportant(idx)) % represents inf scores hold off xticklabels(strrep(credittrain.properties.variablenames(idx),"_","\_")) xtickangle(45) legend(["finite scores","inf scores"])
notice that the industry
predictor has a low score corresponding to a p-value that is greater than 0.05, which indicates that industry
might not be an important feature. remove the industry
feature from the training and test data sets.
credittrain = removevars(credittrain,'industry'); credittest = removevars(credittest,'industry');
run fitcauto
pass the training data to fitcauto
. the function uses bayesian optimization to select models and their hyperparameter values, and returns a trained model mdl
with the best expected performance. specify to try all available learner types and run the optimization in parallel (requires parallel computing toolbox™). return a second output results
that contains the details of the bayesian optimization.
expect this process to take some time. by default, fitcauto
provides a plot of the optimization and an iterative display of the optimization results. for more information on how to interpret these results, see verbose display.
options = struct("useparallel",true); [mdl,results] = fitcauto(credittrain,"rating", ... "learners","all","hyperparameteroptimizationoptions",options);
warning: it is recommended that you first standardize all numeric predictors when optimizing the naive bayes 'width' parameter. ignore this warning if you have done that.
copying objective function to workers...
warning: files that have already been attached are being ignored. to see which files are attached see the 'attachedfiles' property of the parallel pool.
done copying objective function to workers. learner types to explore: discr, ensemble, kernel, knn, linear, nb, net, svm, tree total iterations (maxobjectiveevaluations): 270 total time (maxtime): inf |=======================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |=======================================================================================================================================================| | 1 | 7 | best | 0.31798 | 0.78014 | 0.31798 | 0.31798 | tree | minleafsize: 1 | | 2 | 7 | accept | 0.47622 | 0.76358 | 0.31798 | 0.31798 | knn | numneighbors: 63 | | | | | | | | | | distance: correlation | | 3 | 3 | accept | 0.42896 | 0.95281 | 0.31798 | 0.31798 | discr | delta: 1.5003e-06 | | | | | | | | | | gamma: 0.47392 | | 4 | 3 | accept | 0.74185 | 0.99139 | 0.31798 | 0.31798 | discr | delta: 389.85 | | | | | | | | | | gamma: 0.29596 | | 5 | 3 | accept | 0.7236 | 1.0177 | 0.31798 | 0.31798 | knn | numneighbors: 599 | | | | | | | | | | distance: hamming | | 6 | 3 | accept | 0.5848 | 2.195 | 0.31798 | 0.31798 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.84619 | | | | | | | | | | kernelscale: 19.346 | | 7 | 3 | accept | 0.74185 | 2.3486 | 0.31798 | 0.31798 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.0052084 | | | | | | | | | | kernelscale: 128.61 | | 8 | 8 | best | 0.28059 | 0.30573 | 0.28059 | 0.28059 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 9 | 6 | accept | 0.74215 | 6.4474 | 0.28059 | 0.28059 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 0.028123 | | | | | | | | | | lambda: 0.016312 | | 10 | 6 | accept | 0.74185 | 1.0242 | 0.28059 | 0.28059 | knn | numneighbors: 1052 | | | | | | | | | | distance: jaccard | | 11 | 6 | accept | 0.38618 | 1.1643 | 0.28059 | 0.28059 | knn | numneighbors: 8 | | | | | | | | | | distance: mahalanobis | | 12 | 5 | accept | 0.76967 | 1.9214 | 0.28059 | 0.29713 | ensemble | method: rusboost | | | | | | | | | | numlearningcycles: 14 | | | | | | | | | | learnrate: 0.0069636 | | | | | | | | | | minleafsize: 230 | | 13 | 5 | accept | 0.3096 | 2.4447 | 0.28059 | 0.29713 | nb | distributionnames: kernel | | | | | | | | | | width: 0.45485 | | 14 | 8 | accept | 0.40532 | 1.6213 | 0.28059 | 0.29713 | linear | coding: onevsall | | | | | | | | | | lambda: 0.00047099 | | | | | | | | | | learner: logistic | | 15 | 7 | accept | 0.74185 | 4.2042 | 0.28059 | 0.29713 | kernel | coding: onevsall | | | | | | | | | | kernelscale: 20.46 | | | | | | | | | | lambda: 0.081223 | | 16 | 7 | accept | 0.48011 | 0.83491 | 0.28059 | 0.29713 | linear | coding: onevsall | | | | | | | | | | lambda: 0.0012929 | | | | | | | | | | learner: svm | | 17 | 4 | accept | 0.379 | 7.0624 | 0.24978 | 0.31798 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 156.99 | | | | | | | | | | lambda: 7.336e-06 | | 18 | 4 | best | 0.24978 | 2.7256 | 0.24978 | 0.31798 | linear | coding: onevsone | | | | | | | | | | lambda: 1.3809e-08 | | | | | | | | | | learner: logistic | | 19 | 4 | accept | 0.25067 | 1.5777 | 0.24978 | 0.31798 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 80.452 | | | | | | | | | | kernelscale: 20.931 | | 20 | 4 | accept | 0.68112 | 2.197 | 0.24978 | 0.31798 | nb | distributionnames: kernel | | | | | | | | | | width: 3.3045 | |=======================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |=======================================================================================================================================================| | 21 | 8 | best | 0.24319 | 2.5177 | 0.24319 | 0.31798 | linear | coding: onevsone | | | | | | | | | | lambda: 7.4269e-05 | | | | | | | | | | learner: svm | | 22 | 5 | accept | 0.24559 | 2.5222 | 0.24319 | 0.31798 | linear | coding: onevsone | | | | | | | | | | lambda: 9.6858e-06 | | | | | | | | | | learner: svm | | 23 | 5 | accept | 0.26653 | 0.28511 | 0.24319 | 0.31798 | tree | minleafsize: 61 | | 24 | 5 | accept | 0.66647 | 0.40489 | 0.24319 | 0.31798 | knn | numneighbors: 81 | | | | | | | | | | distance: hamming | | 25 | 5 | accept | 0.28059 | 0.19167 | 0.24319 | 0.31798 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 26 | 7 | accept | 0.24499 | 2.1437 | 0.24319 | 0.31798 | linear | coding: onevsone | | | | | | | | | | lambda: 0.0014257 | | | | | | | | | | learner: svm | | 27 | 7 | accept | 0.28059 | 0.19273 | 0.24319 | 0.31798 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 28 | 6 | accept | 0.26503 | 0.31492 | 0.24319 | 0.31798 | tree | minleafsize: 41 | | 29 | 6 | accept | 0.74185 | 0.86991 | 0.24319 | 0.31798 | net | activations: relu | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 19.947 | | | | | | | | | | layersizes: [ 12 9 162 ] | | 30 | 6 | accept | 0.71523 | 1.7617 | 0.24319 | 0.27687 | linear | coding: onevsone | | | | | | | | | | lambda: 7.8161 | | | | | | | | | | learner: svm | | 31 | 7 | accept | 0.31289 | 0.29991 | 0.24319 | 0.27687 | knn | numneighbors: 12 | | | | | | | | | | distance: seuclidean | | 32 | 7 | accept | 0.31289 | 0.23823 | 0.24319 | 0.27687 | knn | numneighbors: 12 | | | | | | | | | | distance: seuclidean | | 33 | 6 | accept | 0.77984 | 11.017 | 0.24319 | 0.27687 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 0.0048668 | | | | | | | | | | lambda: 6.1392e-06 | | 34 | 6 | accept | 0.31289 | 0.18884 | 0.24319 | 0.27687 | knn | numneighbors: 12 | | | | | | | | | | distance: seuclidean | | 35 | 5 | accept | 0.78612 | 3.896 | 0.24319 | 0.27687 | kernel | coding: onevsall | | | | | | | | | | kernelscale: 0.0016754 | | | | | | | | | | lambda: 6.6323e-06 | | 36 | 5 | accept | 0.74185 | 0.47636 | 0.24319 | 0.27687 | discr | delta: 30.053 | | | | | | | | | | gamma: 0.73742 | | 37 | 8 | accept | 0.26443 | 0.12865 | 0.24319 | 0.27532 | tree | minleafsize: 33 | | 38 | 5 | accept | 0.32276 | 0.13117 | 0.24319 | 0.27532 | tree | minleafsize: 328 | | 39 | 5 | accept | 0.28059 | 0.17681 | 0.24319 | 0.27532 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 40 | 5 | accept | 0.71702 | 0.44235 | 0.24319 | 0.27532 | discr | delta: 4.3736 | | | | | | | | | | gamma: 0.086011 | |=======================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |=======================================================================================================================================================| | 41 | 5 | accept | 0.25755 | 0.30003 | 0.24319 | 0.27532 | knn | numneighbors: 17 | | | | | | | | | | distance: cityblock | | 42 | 8 | accept | 0.58989 | 1.8892 | 0.24319 | 0.27532 | kernel | coding: onevsall | | | | | | | | | | kernelscale: 2.3154 | | | | | | | | | | lambda: 0.17755 | | 43 | 8 | best | 0.23961 | 1.8068 | 0.23961 | 0.27532 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 1.6804 | | | | | | | | | | kernelscale: 0.69915 | | 44 | 8 | accept | 0.52019 | 3.4078 | 0.23961 | 0.27532 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.0020203 | | | | | | | | | | kernelscale: 4.9323 | | 45 | 8 | accept | 0.39844 | 1.2275 | 0.23961 | 0.27532 | linear | coding: onevsall | | | | | | | | | | lambda: 0.000245 | | | | | | | | | | learner: logistic | | 46 | 7 | accept | 0.29345 | 3.2203 | 0.23961 | 0.28001 | ensemble | method: rusboost | | | | | | | | | | numlearningcycles: 11 | | | | | | | | | | learnrate: 0.0037912 | | | | | | | | | | minleafsize: 3 | | 47 | 7 | accept | 0.28597 | 0.24422 | 0.23961 | 0.28001 | tree | minleafsize: 10 | | 48 | 8 | accept | 0.24828 | 2.1892 | 0.23961 | 0.26397 | linear | coding: onevsone | | | | | | | | | | lambda: 3.0856e-06 | | | | | | | | | | learner: logistic | | 49 | 7 | accept | 0.42776 | 0.87452 | 0.23961 | 0.26397 | discr | delta: 0.00024246 | | | | | | | | | | gamma: 0.94088 | | 50 | 7 | accept | 0.42776 | 0.73959 | 0.23961 | 0.26397 | discr | delta: 0.00024246 | | | | | | | | | | gamma: 0.94088 | | 51 | 6 | accept | 0.85372 | 22.106 | 0.23961 | 0.26397 | ensemble | method: rusboost | | | | | | | | | | numlearningcycles: 335 | | | | | | | | | | learnrate: 0.0029421 | | | | | | | | | | minleafsize: 418 | | 52 | 6 | accept | 0.26982 | 0.13679 | 0.23961 | 0.26397 | tree | minleafsize: 17 | | 53 | 6 | accept | 0.74185 | 0.39262 | 0.23961 | 0.26397 | net | activations: sigmoid | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 18.22 | | | | | | | | | | layersizes: [ 18 7 ] | | 54 | 7 | accept | 0.28059 | 0.16664 | 0.23961 | 0.26397 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 55 | 7 | accept | 0.28059 | 0.12859 | 0.23961 | 0.26397 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 56 | 7 | accept | 0.28059 | 0.11931 | 0.23961 | 0.26397 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 57 | 7 | accept | 0.43703 | 0.12446 | 0.23961 | 0.26397 | discr | delta: 0.38801 | | | | | | | | | | gamma: 0.19442 | | 58 | 6 | accept | 0.49028 | 18.139 | 0.23961 | 0.26397 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 1.0284 | | | | | | | | | | kernelscale: 0.066897 | | 59 | 6 | accept | 0.74185 | 0.41221 | 0.23961 | 0.26397 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 1.1378 | | | | | | | | | | layersizes: [ 58 2 ] | | 60 | 6 | accept | 0.49447 | 0.84836 | 0.23961 | 0.27351 | linear | coding: onevsall | | | | | | | | | | lambda: 3.1837e-07 | | | | | | | | | | learner: svm | |=======================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |=======================================================================================================================================================| | 61 | 7 | accept | 0.70117 | 0.17095 | 0.23961 | 0.27351 | knn | numneighbors: 1 | | | | | | | | | | distance: hamming | | 62 | 7 | accept | 0.70117 | 0.17044 | 0.23961 | 0.27351 | knn | numneighbors: 1 | | | | | | | | | | distance: hamming | | 63 | 7 | accept | 0.4819 | 0.16752 | 0.23961 | 0.27351 | knn | numneighbors: 15 | | | | | | | | | | distance: correlation | | 64 | 8 | accept | 0.24469 | 2.4156 | 0.23961 | 0.26436 | linear | coding: onevsone | | | | | | | | | | lambda: 1.0606e-06 | | | | | | | | | | learner: svm | | 65 | 7 | accept | 0.27251 | 1.4945 | 0.23961 | 0.26436 | nb | distributionnames: kernel | | | | | | | | | | width: 0.016406 | | 66 | 7 | accept | 0.27251 | 1.5369 | 0.23961 | 0.26436 | nb | distributionnames: kernel | | | | | | | | | | width: 0.016406 | | 67 | 8 | accept | 0.3102 | 1.6327 | 0.23961 | 0.26436 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.66555 | | | | | | | | | | kernelscale: 7.7425 | | 68 | 7 | accept | 0.31798 | 0.31217 | 0.23961 | 0.26436 | tree | minleafsize: 1 | | 69 | 7 | accept | 0.31798 | 0.1989 | 0.23961 | 0.26436 | tree | minleafsize: 1 | | 70 | 8 | accept | 0.7239 | 2.2998 | 0.23961 | 0.26436 | nb | distributionnames: kernel | | | | | | | | | | width: 7.994 | | 71 | 8 | accept | 0.26982 | 0.16298 | 0.23961 | 0.26436 | tree | minleafsize: 17 | | 72 | 7 | accept | 0.4843 | 2.8168 | 0.23961 | 0.26436 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.033141 | | | | | | | | | | kernelscale: 0.28115 | | 73 | 7 | accept | 0.4843 | 2.8684 | 0.23961 | 0.26436 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.033141 | | | | | | | | | | kernelscale: 0.28115 | | 74 | 8 | accept | 0.25157 | 17.917 | 0.23961 | 0.26436 | net | activations: tanh | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 1.7369e-06 | | | | | | | | | | layersizes: 8 | | 75 | 8 | accept | 0.28059 | 0.14319 | 0.23961 | 0.26436 | nb | distributionnames: normal | | | | | | | | | | width: nan | | 76 | 8 | accept | 0.27401 | 5.2171 | 0.23961 | 0.26436 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 6.8476 | | | | | | | | | | lambda: 0.00036546 | | 77 | 8 | accept | 0.24678 | 5.4758 | 0.23961 | 0.26436 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 1.6458 | | | | | | | | | | lambda: 0.00029076 | | 78 | 8 | accept | 0.24768 | 15.469 | 0.23961 | 0.26436 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 1.6458 | | | | | | | | | | lambda: 0.00029076 | | 79 | 8 | accept | 0.74185 | 2.8161 | 0.23961 | 0.26436 | nb | distributionnames: kernel | | | | | | | | | | width: 57.408 | | 80 | 8 | accept | 0.26683 | 16.599 | 0.23961 | 0.26436 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 356 | | | | | | | | | | learnrate: 0.4849 | | | | | | | | | | minleafsize: 190 | |=======================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |=======================================================================================================================================================| | 81 | 8 | accept | 0.43045 | 0.14072 | 0.23961 | 0.26436 | discr | delta: 5.5966e-05 | | | | | | | | | | gamma: 0.72927 | | 82 | 8 | accept | 0.42806 | 0.11498 | 0.23961 | 0.26436 | discr | delta: 0.014062 | | | | | | | | | | gamma: 0.11412 | | 83 | 7 | accept | 0.34699 | 17.983 | 0.23961 | 0.26436 | kernel | coding: onevsall | | | | | | | | | | kernelscale: 0.15793 | | | | | | | | | | lambda: 3.4157e-07 | | 84 | 7 | accept | 0.26713 | 0.14254 | 0.23961 | 0.26436 | tree | minleafsize: 37 | | 85 | 8 | accept | 0.25067 | 0.34385 | 0.23961 | 0.26436 | knn | numneighbors: 107 | | | | | | | | | | distance: euclidean | | 86 | 8 | accept | 0.25067 | 0.3674 | 0.23961 | 0.26436 | knn | numneighbors: 107 | | | | | | | | | | distance: euclidean | | 87 | 8 | accept | 0.25456 | 0.12393 | 0.23961 | 0.26436 | knn | numneighbors: 21 | | | | | | | | | | distance: euclidean | | 88 | 8 | accept | 0.42866 | 0.16366 | 0.23961 | 0.26436 | discr | delta: 0.010446 | | | | | | | | | | gamma: 0.45787 | | 89 | 8 | accept | 0.42746 | 7.739 | 0.23961 | 0.26436 | net | activations: none | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 0.012033 | | | | | | | | | | layersizes: [ 36 71 224 ] | | 90 | 8 | accept | 0.42148 | 0.1231 | 0.23961 | 0.26436 | discr | delta: 6.5069e-06 | | | | | | | | | | gamma: 0.035196 | | 91 | 8 | accept | 0.2402 | 4.845 | 0.23961 | 0.26436 | net | activations: none | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 0.00031695 | | | | | | | | | | layersizes: [ 1 1 4 ] | | 92 | 7 | accept | 0.25337 | 4.0368 | 0.23961 | 0.26436 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 87 | | | | | | | | | | learnrate: 0.72655 | | | | | | | | | | minleafsize: 8 | | 93 | 7 | accept | 0.29076 | 1.476 | 0.23961 | 0.26436 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 6.1463 | | | | | | | | | | kernelscale: 11.586 | | 94 | 8 | accept | 0.24858 | 2.4734 | 0.23961 | 0.25191 | linear | coding: onevsone | | | | | | | | | | lambda: 5.8564e-07 | | | | | | | | | | learner: logistic | | 95 | 8 | accept | 0.75381 | 2.7208 | 0.23961 | 0.25191 | kernel | coding: onevsall | | | | | | | | | | kernelscale: 0.0011931 | | | | | | | | | | lambda: 0.0021196 | | 96 | 8 | accept | 0.74514 | 2.5576 | 0.23961 | 0.25191 | kernel | coding: onevsall | | | | | | | | | | kernelscale: 0.0011931 | | | | | | | | | | lambda: 0.0021196 | | 97 | 8 | accept | 0.2423 | 1.4238 | 0.23961 | 0.25191 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.037756 | | | | | | | | | | kernelscale: 0.054479 | | 98 | 7 | accept | 0.24529 | 72.651 | 0.23961 | 0.25191 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.092471 | | | | | | | | | | kernelscale: 0.0035999 | | 99 | 7 | accept | 0.32605 | 1.0318 | 0.23961 | 0.25191 | nb | distributionnames: kernel | | | | | | | | | | width: 0.002311 | | 100 | 7 | accept | 0.26683 | 0.11759 | 0.23961 | 0.25191 | tree | minleafsize: 60 | |=======================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |=======================================================================================================================================================| | 101 | 7 | accept | 0.31738 | 0.73567 | 0.23961 | 0.25191 | knn | numneighbors: 440 | | | | | | | | | | distance: minkowski | | 102 | 8 | accept | 0.24619 | 2.8761 | 0.23961 | 0.25191 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.051313 | | | | | | | | | | kernelscale: 0.018609 | | 103 | 8 | accept | 0.24619 | 2.8825 | 0.23961 | 0.25191 | svm | coding: onevsone | | | | | | | | | | boxconstraint: 0.051313 | | | | | | | | | | kernelscale: 0.018609 | | 104 | 8 | accept | 0.31798 | 0.18993 | 0.23961 | 0.25191 | knn | numneighbors: 30 | | | | | | | | | | distance: seuclidean | | 105 | 8 | accept | 0.27251 | 7.5016 | 0.23961 | 0.25191 | kernel | coding: onevsall | | | | | | | | | | kernelscale: 2.0044 | | | | | | | | | | lambda: 0.00044252 | | 106 | 8 | accept | 0.25606 | 2.4574 | 0.23961 | 0.25191 | ensemble | method: adaboostm2 | | | | | | | | | | numlearningcycles: 47 | | | | | | | | | | learnrate: 0.37809 | | | | | | | | | | minleafsize: 56 | | 107 | 8 | accept | 0.71702 | 0.11134 | 0.23961 | 0.25191 | discr | delta: 4.4097 | | | | | | | | | | gamma: 0.8883 | | 108 | 8 | accept | 0.43195 | 0.10201 | 0.23961 | 0.25191 | discr | delta: 3.4149e-06 | | | | | | | | | | gamma: 0.66863 | | 109 | 8 | accept | 0.5851 | 0.16296 | 0.23961 | 0.25191 | knn | numneighbors: 1 | | | | | | | | | | distance: correlation | | 110 | 8 | accept | 0.3102 | 120.87 | 0.23961 | 0.25191 | net | activations: tanh | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 1.2141e-07 | | | | | | | | | | layersizes: 157 | | 111 | 7 | accept | 0.79958 | 4.0886 | 0.23961 | 0.25431 | kernel | coding: onevsall | | | | | | | | | | kernelscale: 0.005193 | | | | | | | | | | lambda: 8.7114e-07 | | 112 | 7 | accept | 0.48998 | 1.3636 | 0.23961 | 0.25431 | linear | coding: onevsall | | | | | | | | | | lambda: 1.0966e-07 | | | | | | | | | | learner: svm | | 113 | 8 | accept | 0.36464 | 126.9 | 0.23961 | 0.25431 | net | activations: tanh | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 7.078e-08 | | | | | | | | | | layersizes: [ 85 74 ] | | 114 | 8 | accept | 0.45319 | 139.01 | 0.23961 | 0.25431 | svm | coding: onevsall | | | | | | | | | | boxconstraint: 0.049562 | | | | | | | | | | kernelscale: 0.019349 | | 115 | 7 | accept | 0.24379 | 18.913 | 0.23961 | 0.25431 | net | activations: sigmoid | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 5.0497e-07 | | | | | | | | | | layersizes: [ 1 4 5 ] | | 116 | 7 | accept | 0.42836 | 1.076 | 0.23961 | 0.25431 | discr | delta: 0.0016971 | | | | | | | | | | gamma: 0.35943 | | 117 | 6 | accept | 0.24319 | 20.917 | 0.23961 | 0.25622 | net | activations: sigmoid | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 5.0497e-07 | | | | | | | | | | layersizes: [ 1 4 5 ] | | 118 | 6 | accept | 0.46814 | 1.0997 | 0.23961 | 0.25622 | linear | coding: onevsall | | | | | | | | | | lambda: 7.5025e-05 | | | | | | | | | | learner: svm | | 119 | 8 | accept | 0.74065 | 2.2714 | 0.23961 | 0.25622 | nb | distributionnames: kernel | | | | | | | | | | width: 19.139 | | 120 | 8 | accept | 0.30302 | 0.17067 | 0.23961 | 0.25622 | tree | minleafsize: 191 | |=======================================================================================================================================================| | iter | active | eval | validation | time for training | observed min | estimated min | learner | hyperparameter: value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |=======================================================================================================================================================| | 121 | 8 | accept | 0.26593 | 0.11582 | 0.23961 | 0.25622 | tree | minleafsize: 40 | | 122 | 8 | accept | 0.47712 | 0.68686 | 0.23961 | 0.25178 | linear | coding: onevsall | | | | | | | | | | lambda: 0.2622 | | | | | | | | | | learner: svm | | 123 | 8 | accept | 0.24798 | 8.1465 | 0.23961 | 0.25178 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 85 | | | | | | | | | | learnrate: nan | | | | | | | | | | minleafsize: 14 | | 124 | 6 | accept | 0.29554 | 48.462 | 0.23961 | 0.25178 | net | activations: relu | | | | | | | | | | standardize: true | | | | | | | | | | lambda: 5.8633e-09 | | | | | | | | | | layersizes: 115 | | 125 | 6 | accept | 0.24978 | 8.4739 | 0.23961 | 0.25178 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 85 | | | | | | | | | | learnrate: nan | | | | | | | | | | minleafsize: 14 | | 126 | 6 | accept | 0.25157 | 8.422 | 0.23961 | 0.25178 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 85 | | | | | | | | | | learnrate: nan | | | | | | | | | | minleafsize: 14 | | 127 | 6 | accept | 0.25815 | 1.9238 | 0.23961 | 0.25178 | nb | distributionnames: kernel | | | | | | | | | | width: 0.062941 | | 128 | 6 | accept | 0.24948 | 2.291 | 0.23961 | 0.25148 | linear | coding: onevsone | | | | | | | | | | lambda: 1.2391e-07 | | | | | | | | | | learner: logistic | | 129 | 8 | accept | 0.28118 | 5.9878 | 0.23961 | 0.25148 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 82 | | | | | | | | | | learnrate: nan | | | | | | | | | | minleafsize: 153 | | 130 | 8 | accept | 0.29644 | 8.8037 | 0.23961 | 0.25148 | ensemble | method: rusboost | | | | | | | | | | numlearningcycles: 85 | | | | | | | | | | learnrate: 0.56002 | | | | | | | | | | minleafsize: 8 | | 131 | 7 | accept | 0.39785 | 8.9125 | 0.23961 | 0.25148 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 0.10138 | | | | | | | | | | lambda: 0.00010013 | | 132 | 7 | accept | 0.28298 | 6.7573 | 0.23961 | 0.25148 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 81 | | | | | | | | | | learnrate: nan | | | | | | | | | | minleafsize: 151 | | 133 | 8 | accept | 0.27371 | 6.3282 | 0.23961 | 0.25148 | ensemble | method: bag | | | | | | | | | | numlearningcycles: 84 | | | | | | | | | | learnrate: nan | | | | | | | | | | minleafsize: 129 | | 134 | 8 | accept | 0.24559 | 76.492 | 0.23961 | 0.25148 | net | activations: sigmoid | | | | | | | | | | standardize: false | | | | | | | | | | lambda: 2.4405e-07 | | | | | | | | | | layersizes: [ 33 17 82 ] | | 135 | 7 | accept | 0.34819 | 16.667 | 0.23961 | 0.25148 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 0.12039 | | | | | | | | | | lambda: 0.00010483 | | 136 | 7 | accept | 0.42477 | 8.2081 | 0.23961 | 0.25148 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 0.095799 | | | | | | | | | | lambda: 8.6717e-05 | | 137 | 6 | accept | 0.25007 | 4.8241 | 0.23961 | 0.25148 | kernel | coding: onevsone | | | | | | | | | | kernelscale: 0.54593 | | | | | | | | | | lambda: 0.0017131 | | 138 | 6 | accept | 0.74185 | 0.75998 | 0.23961 | 0.25148 | net | activations: sigmoid | | | | | | ...
__________________________________________________________ optimization completed. total iterations: 271 total elapsed time: 907.7117 seconds total time for training and validation: 5400.9893 seconds best observed learner is a net model with: learner: net activations: relu standardize: false lambda: 0.0004658 layersizes: [1 31 23] observed validation loss: 0.23811 time for training and validation: 21.2958 seconds best estimated learner (returned model) is a net model with: learner: net activations: none standardize: false lambda: 0.00036647 layersizes: [1 6 10] estimated validation loss: 0.24112 estimated time for training and validation: 5.894 seconds documentation for fitcauto display
the final model returned by fitcauto
corresponds to the best estimated learner. before returning the model, the function retrains it using the entire training data (credittrain
), the listed learner
(or model) type, and the displayed hyperparameter values.
evaluate test set performance
the model mdl
corresponds to the best point in the bayesian optimization according to the "min-visited-mean"
criterion. to gauge how the model will perform on new data, look at the observed cross-validation accuracy of the model (cvaccuracy
) and its general estimated performance based on the bayesian optimization (estimatedaccuracy
).
[x,~,iteration] = bestpoint(results,"criterion","min-visited-mean"); cverror = results.objectivetrace(iteration); cvaccuracy = 1 - cverror
cvaccuracy = 0.7595
estimatederror = predictobjective(results,x); estimatedaccuracy = 1 - estimatederror
estimatedaccuracy = 0.7589
evaluate the performance of the model on the test set. create a confusion matrix from the results, and specify the order of the classes in the confusion matrix.
testaccuracy = 1 - loss(mdl,credittest,"rating")
testaccuracy = 0.7437
cm = confusionchart(credittest.rating,predict(mdl,credittest)); sortclasses(cm,["aaa","aa","a","bbb","bb","b","ccc"])
input arguments
tbl
— sample data
table
sample data, specified as a table. each row of tbl
corresponds to one observation, and each column corresponds to one predictor. optionally, tbl
can contain one additional column for the response variable. multicolumn variables and cell arrays other than cell arrays of character vectors are not accepted.
if tbl
contains the response variable, and you want to use all remaining
variables in tbl
as predictors, specify the response variable using
responsevarname
.
if tbl
contains the response variable, and you want to use only a subset of the remaining variables in tbl
as predictors, specify a formula using formula
.
if tbl
does not contain the response variable, specify a response variable using y
. the length of the response variable and the number of rows in tbl
must be equal.
data types: table
responsevarname
— response variable name
name of variable in tbl
response variable name, specified as the name of a variable in
tbl
.
you must specify responsevarname
as a character vector or string scalar.
for example, if the response variable y
is
stored as tbl.y
, then specify it as
"y"
. otherwise, the software
treats all columns of tbl
, including
y
, as predictors when training
the model.
the response variable must be a categorical, character, or string array; a logical or numeric
vector; or a cell array of character vectors. if
y
is a character array, then each
element of the response variable must correspond to one row of
the array.
a good practice is to specify the order of the classes by using the
classnames
name-value
argument.
data types: char
| string
formula
— explanatory model of response variable and subset of predictor variables
character vector | string scalar
explanatory model of the response variable and a subset of the predictor variables,
specified as a character vector or string scalar in the form
"y~x1 x2 x3"
. in this form, y
represents the
response variable, and x1
, x2
, and
x3
represent the predictor variables.
to specify a subset of variables in tbl
as predictors for
training the model, use a formula. if you specify a formula, then the software does not
use any variables in tbl
that do not appear in
formula
.
the variable names in the formula must be both variable names in tbl
(tbl.properties.variablenames
) and valid matlab® identifiers. you can verify the variable names in tbl
by
using the isvarname
function. if the variable names
are not valid, then you can convert them by using the matlab.lang.makevalidname
function.
data types: char
| string
y
— class labels
numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors
class labels, specified as a numeric, categorical, or logical vector, a character or string array, or a cell array of character vectors.
if
y
is a character array, then each element of the class labels must correspond to one row of the array.the length of
y
must be equal to the number of rows intbl
orx
.a good practice is to specify the class order by using the
classnames
name-value argument.
data types: single
| double
| categorical
| logical
| char
| string
| cell
x
— predictor data
numeric matrix
predictor data, specified as a numeric matrix.
each row of x
corresponds to one observation, and each column corresponds to one predictor.
the length of y
and the number of rows in x
must be equal.
to specify the names of the predictors in the order of their appearance in
x
, use the predictornames
name-value
argument.
data types: single
| double
note
the software treats nan
, empty character vector
(''
), empty string (""
),
, and
elements as
missing data. the software removes rows of data corresponding to missing values in the
response variable. however, the treatment of missing values in the predictor data
x
or tbl
varies among models (or
learners).
name-value arguments
specify optional pairs of arguments as
name1=value1,...,namen=valuen
, where name
is
the argument name and value
is the corresponding value.
name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
before r2021a, use commas to separate each name and value, and enclose
name
in quotes.
example: "hyperparameteroptimizationoptions",struct("maxobjectiveevaluations",200,"verbose",2)
specifies to run 200 iterations of the optimization process (that is, try 200 model
hyperparameter combinations), and to display information in the command window about the
next model hyperparameter combination to be evaluated.
learners
— types of classification models
"auto"
(default) | "all"
| "all-linear"
| "all-nonlinear"
| one or more learner names
types of classification models to try during the optimization, specified as a value in the first table below or one or more learner names in the second table. specify multiple learner names as a string or cell array.
value | description |
---|---|
"auto" |
note to provide the best hyperparameter optimization experience, the automatic selection of learners behavior is subject to frequent changes. for a more consistent selection of learners across software releases, explicitly specify the models you want to include. |
"all" | fitcauto selects all possible learners. |
"all-linear" | fitcauto selects linear learners:
"discr" (with a linear discriminant type) and
"linear" . |
"all-nonlinear" | fitcauto selects all nonlinear learners:
"discr" (with a quadratic discriminant type),
"ensemble" , "kernel" ,
"knn" , "nb" ,
"net" , "svm" (with a gaussian or
polynomial kernel), and "tree" . |
note
for greater efficiency, fitcauto
does not select the following combinations of models when you specify one of the previous values.
"kernel"
and"svm"
(with a gaussian kernel) —fitcauto
chooses the first when the predictor data has more than 11,000 observations, and the second otherwise."linear"
and"svm"
(with a linear kernel) —fitcauto
chooses the first.
learner name | description |
---|---|
"discr" | discriminant analysis classifier |
"ensemble" | ensemble classification model |
"kernel" | kernel classification model |
"knn" | k-nearest neighbor model |
"linear" | linear classification model |
"nb" | naive bayes classifier |
"net" | neural network classifier |
"svm" | support vector machine classifier |
"tree" | binary decision classification tree |
example: "learners","all"
example: "learners","ensemble"
example: "learners",["svm","tree"]
data types: char
| string
| cell
optimizehyperparameters
— hyperparameters to optimize
"auto"
(default) | "all"
hyperparameters to optimize, specified as "auto"
or
"all"
. the optimizable hyperparameters depend on the model (or
learner), as described in this table.
learner name | hyperparameters for "auto" | additional hyperparameters for "all" | notes |
---|---|---|---|
"discr" | delta , gamma | discrimtype |
for more information, including hyperparameter search
ranges, see |
"ensemble" | method , numlearningcycles , learnrate , minleafsize | maxnumsplits ,
numvariablestosample , splitcriterion | when the ensemble for more information, including
hyperparameter search ranges, see |
"kernel" | , , coding (for three
or more classes only) | , | for more information, including hyperparameter search ranges, see
and optimizehyperparameters (for three or more classes only). note
that you cannot change hyperparameter search ranges when you use
fitcauto . |
"knn" | distance ,
numneighbors | distanceweight , exponent ,
standardize | for more information, including hyperparameter search ranges, see
optimizehyperparameters . note that you cannot change
hyperparameter search ranges when you use
fitcauto . |
"linear" | lambda , learner , coding (for three
or more classes only) | regularization | for more information, including hyperparameter search ranges, see
optimizehyperparameters and optimizehyperparameters (for three or more classes only). note
that you cannot change hyperparameter search ranges when you use
fitcauto . |
"nb" | distributionnames , width | kernel | for more information, including hyperparameter search ranges, see
optimizehyperparameters . note that you cannot change
hyperparameter search ranges when you use
fitcauto . |
"net" | activations , lambda , layersizes , standardize | layerbiasesinitializer , layerweightsinitializer | for more information, including hyperparameter search ranges, see
optimizehyperparameters . note that you cannot change
hyperparameter search ranges when you use
fitcauto . |
"svm" | boxconstraint , kernelscale ,
coding (for three
or more classes only) | kernelfunction , polynomialorder , standardize | when the for more information, including hyperparameter search
ranges, see |
"tree" | minleafsize | maxnumsplits ,
splitcriterion | for more information, including hyperparameter search ranges, see
optimizehyperparameters . note that you cannot change
hyperparameter search ranges when you use
fitcauto . |
note
when learners
is set to a value other than
"auto"
, the default values for the model hyperparameters not
being optimized match the default fit function values, unless otherwise indicated in
the table notes. when learners
is set to
"auto"
, the optimized hyperparameter search ranges and
nonoptimized hyperparameter values can vary, depending on the characteristics of the
training data. for more information, see automatic selection of learners.
example: "optimizehyperparameters","all"
hyperparameteroptimizationoptions
— options for optimization
structure
options for the optimization, specified as a structure. all fields in the structure are optional.
field name | values | default |
---|---|---|
optimizer |
| "bayesopt" |
maxobjectiveevaluations | maximum number of iterations (objective function evaluations), specified as a positive integer |
|
maxtime | time limit, specified as a positive real number. the time limit is
in seconds, as measured by | inf |
showplots | logical value indicating whether to show a plot of the optimization
progress. if true , this field plots the observed minimum
validation loss against the iteration number. when you use bayesian
optimization, the plot also shows the estimated minimum validation
loss. | true |
saveintermediateresults | logical value indicating whether to save results. if
true , this field overwrites a workspace variable at each
iteration. the variable is a bayesianoptimization object named
bayesoptresults if you use bayesian optimization, and a
table named asharesults if you use asha
optimization. | false |
verbose | display at the command line:
| 1 |
useparallel | logical value indicating whether to run the optimization in parallel, which requires parallel computing toolbox™. due to the nonreproducibility of parallel timing, parallel optimization does not necessarily yield reproducible results. | false |
repartition | logical value indicating whether to repartition the
cross-validation at every iteration. if
| false |
maxtrainingsetsize | maximum number of observations in each training set, specified as a positive integer. this value matches the largest training set size. note if you want to specify this value, the | largest available training partition size
|
mintrainingsetsize | minimum number of observations in each training set, specified as a positive integer. this value is a lower bound for the smallest training set size. note if you want to specify this value, the | 100 |
specify only one of the following three options. | ||
cvpartition | cvpartition object, created by cvpartition | "kfold",5 if you do not specify any
cross-validation field |
holdout | scalar in the range (0,1) representing the holdout
fraction | |
kfold | integer greater than 1 |
example: "hyperparameteroptimizationoptions",struct("useparallel",true)
data types: struct
categoricalpredictors
— categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | "all"
categorical predictors list, specified as one of the values in this table.
value | description |
---|---|
vector of positive integers |
each entry in the vector is an index value indicating that the corresponding predictor is
categorical. the index values are between 1 and if |
logical vector |
a |
character matrix | each row of the matrix is the name of a predictor variable. the names must match the entries in predictornames . pad the names with extra blanks so each row of the character matrix has the same length. |
string array or cell array of character vectors | each element in the array is the name of a predictor variable. the names must match the entries in predictornames . |
"all" | all predictors are categorical. |
by default, if the predictor data is in a table (tbl
),
fitcauto
assumes that a variable is categorical if it is a
logical vector, categorical vector, character array, string array, or cell array of
character vectors. however, learners that use decision trees assume that mathematically
ordered categorical vectors are continuous variables. if the predictor data is a matrix
(x
), fitcauto
assumes that all
predictors are continuous. to identify any other predictors as categorical predictors,
specify them by using the categoricalpredictors
name-value
argument.
for more information on how fitting functions treat categorical predictors, see automatic creation of dummy variables.
note
fitcauto
does not support categorical predictors for discriminant analysis classifiers. that is, if you wantlearners
to include"discr"
models, you cannot specify thecategoricalpredictors
name-value argument or use a table of sample data (tbl
) containing categorical predictors.fitcauto
does not support a mix of numeric and categorical predictors for k-nearest neighbor models. that is, if you wantlearners
to include"knn"
models, you must specify thecategoricalpredictors
value as"all"
or[]
.
example: "categoricalpredictors","all"
data types: single
| double
| logical
| char
| string
| cell
classnames
— names of classes to use for training
categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors
names of classes to use for training, specified as a categorical, character, or string
array; a logical or numeric vector; or a cell array of character vectors.
classnames
must have the same data type as the response variable
in tbl
or y
.
if classnames
is a character array, then each element must correspond to one row of the array.
use classnames
to:
specify the order of the classes during training.
specify the order of any input or output argument dimension that corresponds to the class order. for example, use
classnames
to specify the order of the dimensions ofcost
or the column order of classification scores returned bypredict
.select a subset of classes for training. for example, suppose that the set of all distinct class names in
y
is["a","b","c"]
. to train the model using observations from classes"a"
and"c"
only, specify"classnames",["a","c"]
.
the default value for classnames
is the set of all distinct class names in the response variable in tbl
or y
.
example: "classnames",["b","g"]
data types: categorical
| char
| string
| logical
| single
| double
| cell
cost
— misclassification cost
square matrix | structure array
misclassification cost, specified as a square matrix or structure array.
if you specify a square matrix
cost
and the true class of an observation isi
, thencost(i,j)
is the cost of classifying a point into classj
. that is, rows correspond to the true classes and columns correspond to the predicted classes. to specify the class order for the corresponding rows and columns ofcost
, also specify theclassnames
name-value argument.if you specify a structure
s
, then it must have two fields:s.classnames
, which contains the class names as a variable of the same data type asy
s.classificationcosts
, which contains the cost matrix with rows and columns ordered as ins.classnames
misclassification costs are used differently by the various models in
learners
. however, fitcauto
computes the
same mean misclassification cost to compare the models during the optimization
process. for more information, see mean misclassification cost.
the default value for cost
is ones(k) –
eye(k)
, where k
is the number of distinct
classes.
example: "cost",[0 1; 2 0]
data types: single
| double
| struct
predictornames
— predictor variable names
string array of unique names | cell array of unique character vectors
predictor variable names, specified as a string array of unique names or cell array of unique
character vectors. the functionality of predictornames
depends on the
way you supply the training data.
if you supply
x
andy
, then you can usepredictornames
to assign names to the predictor variables inx
.the order of the names in
predictornames
must correspond to the column order ofx
. that is,predictornames{1}
is the name ofx(:,1)
,predictornames{2}
is the name ofx(:,2)
, and so on. also,size(x,2)
andnumel(predictornames)
must be equal.by default,
predictornames
is{'x1','x2',...}
.
if you supply
tbl
, then you can usepredictornames
to choose which predictor variables to use in training. that is,fitcauto
uses only the predictor variables inpredictornames
and the response variable during training.predictornames
must be a subset oftbl.properties.variablenames
and cannot include the name of the response variable.by default,
predictornames
contains the names of all predictor variables.a good practice is to specify the predictors for training using either
predictornames
orformula
, but not both.
example: "predictornames",["sepallength","sepalwidth","petallength","petalwidth"]
data types: string
| cell
prior
— prior probabilities
"empirical"
(default) | "uniform"
| numeric vector | structure array
prior probabilities for each class, specified as a value in this table.
value | description |
---|---|
"empirical" | the class prior probabilities are the class relative frequencies in
y . |
"uniform" | all class prior probabilities are equal to 1/k, where k is the number of classes. |
numeric vector | each element is a class prior probability. order the elements according
to mdl .classnames or specify the
order using the classnames name-value argument. the
software normalizes the elements to sum to 1 . |
structure | a structure
|
example: "prior",struct("classnames",["b","g"],"classprobs",1:2)
data types: single
| double
| char
| string
| struct
responsename
— response variable name
"y"
(default) | character vector | string scalar
response variable name, specified as a character vector or string scalar.
if you supply
y
, then you can useresponsename
to specify a name for the response variable.if you supply
responsevarname
orformula
, then you cannot useresponsename
.
example: "responsename","response"
data types: char
| string
scoretransform
— score transformation
"none"
(default) | "doublelogit"
| "invlogit"
| "ismax"
| "logit"
| function handle | ...
score transformation, specified as a character vector, string scalar, or function handle.
this table summarizes the available character vectors and string scalars.
value | description |
---|---|
"doublelogit" | 1/(1 e–2x) |
"invlogit" | log(x / (1 – x)) |
"ismax" | sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0 |
"logit" | 1/(1 e–x) |
"none" or "identity" | x (no transformation) |
"sign" | –1 for x < 0 0 for x = 0 1 for x > 0 |
"symmetric" | 2x – 1 |
"symmetricismax" | sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1 |
"symmetriclogit" | 2/(1 e–x) – 1 |
for a matlab function or a function you define, use its function handle for the score transform. the function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).
example: "scoretransform","logit"
data types: char
| string
| function_handle
weights
— observation weights
positive numeric vector | name of variable in tbl
observation weights, specified as a positive numeric vector or the name of a
variable in tbl
. the software weights each observation in
x
or tbl
with the corresponding value in
weights
. the length of weights
must equal
the number of rows in x
or tbl
.
if you specify the input data as a table tbl
, then
weights
can be the name of a variable in
tbl
that contains a numeric vector. in this case, you must
specify weights
as a character vector or string scalar. for
example, if the weights vector w
is stored as
tbl.w
, then specify it as "w"
. otherwise, the
software treats all columns of tbl
, including
w
, as predictors or the response variable when training the
model.
by default, weights
is ones(n,1)
, where
n
is the number of observations in x
or
tbl
.
the software normalizes weights
to sum to the value of the
prior probability in the respective class.
data types: single
| double
| char
| string
output arguments
mdl
— trained classification model
classification model object
trained classification model, returned as one of the classification model objects in this table.
learner name | returned model object |
---|---|
"discr" | |
"ensemble" | |
"kernel" |
|
"knn" | classificationknn |
"linear" |
|
"nb" | compactclassificationnaivebayes |
"net" | |
"svm" |
|
"tree" | compactclassificationtree |
optimizationresults
— optimization results
bayesianoptimization
object | table
optimization results, returned as a bayesianoptimization
object if you use bayesian optimization or a table if
you use asha optimization. for more information, see bayesian optimization and asha optimization.
more about
verbose display
when you set the verbose
field of the
hyperparameteroptimizationoptions
name-value argument to
1
or 2
, the fitcauto
function
provides an iterative display of the optimization results.
the following table describes the columns in the display and their entries.
column name | description |
---|---|
iter | iteration number — you can set a limit to the number of iterations by using
the maxobjectiveevaluations field of the
hyperparameteroptimizationoptions name-value
argument. |
active workers | number of active parallel workers — this column appears only when you run the
optimization in parallel by setting the useparallel field of
the hyperparameteroptimizationoptions name-value argument to
true . |
eval result | one of the following evaluation results:
|
validation loss | validation loss computed for the learner and hyperparameter values at
this iteration — in particular, you can change the validation scheme by using the
|
time for training & validation (sec) | time taken to train and compute the validation loss for the model with the learner and hyperparameter values at this iteration (in seconds) — when you use bayesian optimization, this value excludes the time required to update the objective function model maintained by the bayesian optimization process. for more details, see bayesian optimization. |
observed min validation loss | observed minimum validation loss computed so far — this value
corresponds to the smallest by default,
|
estimated min validation loss | estimated minimum validation loss — when you use bayesian optimization,
by default, note this column appears only when you use bayesian optimization, that is, when
the |
training set size | number of observations used in each training set at this iteration —
use the note this column appears only when you use asha optimization, that is, when the
|
learner | model type evaluated at this iteration — specify the learners used in the
optimization by using the learners name-value
argument. |
hyperparameter: value | hyperparameter values at this iteration — specify the hyperparameters used in
the optimization by using the optimizehyperparameters
name-value argument. |
the display also includes these model descriptions:
best observed learner
— this model, with the listed learner type and hyperparameter values, yields the final observed minimum validation loss. when you use asha optimization,fitcauto
retrains the model on the entire training data set and returns it as themdl
output.best estimated learner
— this model, with the listed learner type and hyperparameter values, yields the final estimated minimum validation loss when you use bayesian optimization. in this case,fitcauto
retrains the model on the entire training data set and returns it as themdl
output.note
the
best estimated learner
model appears only when you use bayesian optimization, that is, when theoptimizer
field of thehyperparameteroptimizationoptions
name-value argument is set to"bayesopt"
.
tips
depending on the size of your data set, the number of learners you specify, and the optimization method you choose,
fitcauto
can take some time to run.if you have a parallel computing toolbox license, you can speed up computations by running the optimization in parallel. to do so, specify
"hyperparameteroptimizationoptions",struct("useparallel",true)
. you can include additional fields in the structure to control other aspects of the optimization. seehyperparameteroptimizationoptions
.if
fitcauto
with bayesian optimization takes a long time to run because of the number of observations in your training set (for example, over 10,000), consider usingfitcauto
with asha optimization instead. asha optimization often finds good solutions faster than bayesian optimization for data sets with many observations. to use asha optimization, specify"hyperparameteroptimizationoptions",struct("optimizer","asha")
. you can include additional fields in the structure to control other aspects of the optimization. in particular, if you have a time constraint, specify themaxtime
field of thehyperparameteroptimizationoptions
structure to limit the number of secondsfitcauto
runs.
algorithms
automatic selection of learners
when you specify "learners","auto"
, the fitcauto
function analyzes the predictor and response data in order to choose appropriate learners.
the function considers whether the data set has any of these characteristics:
categorical predictors
missing values for more than 5% of the data
imbalanced data, where the ratio of the number of observations in the largest class to the number of observations in the smallest class is greater than 5
more than 100 observations in the smallest class
wide data, where the number of predictors is greater than or equal to the number of observations
high-dimensional data, where the number of predictors is greater than 100
large data, where the number of observations is greater than 50,000
binary response variable
ordinal response variable
the selected learners are always a subset of those listed in the
learners
table. however, the associated models tried during the
optimization process can have different default values for hyperparameters not being
optimized, as well as different search ranges for hyperparameters being optimized.
bayesian optimization
the goal of bayesian optimization, and optimization in general, is to find a point that
minimizes an objective function. in the context of fitcauto
, a point is
a learner type together with a set of hyperparameter values for the learner (see learners
and
optimizehyperparameters
), and the objective function is the cross-validation
classification error, by default. the bayesian optimization implemented in
fitcauto
internally maintains a multi-treebagger
model of the objective function. that is, the objective function
model splits along the learner type and, for a given learner, the model is a
treebagger
ensemble for regression. (this underlying model differs from
the gaussian process model employed by other statistics and machine learning toolbox™ functions that use bayesian optimization.) bayesian optimization trains the
underlying model by using objective function evaluations, and determines the next point to
evaluate by using an acquisition function ("expected-improvement"
). for
more information, see expected improvement. the acquisition function balances between sampling at
points with low modeled objective function values and exploring areas that are not well
modeled yet. at the end of the optimization, fitcauto
chooses the point
with the minimum objective function model value, among the points evaluated during the
optimization. for more information, see the
"criterion","min-visited-mean"
name-value argument of bestpoint
.
asha optimization
the asynchronous successive halving algorithm (asha) in fitcauto
randomly chooses several models with different hyperparameter values (see learners
and
optimizehyperparameters
) and trains them on a small subset of the training
data. if the performance of a particular model is promising, the model is promoted and
trained on a larger amount of the training data. this process repeats, and successful models
are trained on progressively larger amounts of data. by default, at the end of the
optimization, fitcauto
chooses the model that has the lowest
cross-validation classification error.
at each iteration, asha either chooses a previously trained model and promotes it (that is, retrains the model using more training data), or selects a new model (learner type and hyperparameter values) using random search. asha promotes models as follows:
the algorithm searches for the group of models with the largest training set size for which this condition does not hold:
floor(g/4)
of the models have been promoted, whereg
is the number of models in the group.among the group of models, asha chooses the model with the lowest cross-validation classification error and retrains that model with
4*(training set size)
observations.if no such group of models exists, then asha selects a new model instead of promoting an old one, and trains the new model using the smallest training set size.
when a model is trained on a subset of the training data, asha computes the cross-validation classification error as follows:
for each training fold, the algorithm selects a random sample of the observations (of size
training set size
) using stratified sampling, and then trains a model on that subset of data.the algorithm then tests the fitted model on the test fold (that is, the observations not in the training fold) and computes the classification error.
finally, the algorithm averages the results across all folds.
for more information on asha, see [1].
number of asha iterations
when you use asha optimization, the default number of iterations depends on the number
of observations in the data, the number of learner types, the use of parallel processing,
and the type of cross-validation. the algorithm selects the number of iterations such that,
for l learner types (see learners
),
fitcauto
trains l models on the largest training
set size.
this table describes the default number of iterations based on the given specifications when you use 5-fold cross-validation. note that n represents the number of observations and l represents the number of learner types.
number of observations n | default number of iterations (run in serial) | default number of iterations (run in parallel) |
---|---|---|
n < 500 | 30*l — n is too small to implement asha
optimization, and fitcauto implements random search to find and
assess models instead. | 30*l — n is too small to implement asha
optimization, and fitcauto implements random search to find and
assess models instead. |
500 ≤ n < 2000 | 5*l | 5*(l 1) |
2000 ≤ n < 8000 | 21*l | 21*(l 1) |
8000 ≤ n < 32,000 | 85*l | 85*(l 1) |
32,000 ≤ n | 341*l | 341*(l 1) |
mean misclassification cost
if you specify the cost
name-value argument, then
fitcauto
minimizes the mean misclassification cost rather than the
misclassification error as part of the optimization process. the mean misclassification cost
is defined as
where
c is the misclassification cost matrix as specified by the
cost
name-value argument, and i is the indicator function.yj is the true class label for observation j, and yj belongs to class kj.
is the class label with the maximal predicted score for observation j, and belongs to class .
n is the number of observations in the validation set.
alternative functionality
if you are unsure which models work best for your data set, you can alternatively use the classification learner app. using the app, you can perform hyperparameter tuning for different models, and choose the optimized model that performs best. although you must select a specific model before you can tune the model hyperparameters, classification learner provides greater flexibility for selecting optimizable hyperparameters and setting hyperparameter values. however, you cannot optimize in parallel, optimize
"linear"
or"kernel"
learners, specify observation weights, specify prior probabilities, or use asha optimization in the app. for more information, see hyperparameter optimization in classification learner app.if you know which models might suit your data, you can alternatively use the corresponding model fit functions and specify the
optimizehyperparameters
name-value argument to tune hyperparameters. you can compare the results across the models to select the best classifier. for an example of this process, see moving towards automating model selection using bayesian optimization.
references
[1] li, liam, kevin jamieson, afshin rostamizadeh, ekaterina gonina, moritz hardt, benjamin recht, and ameet talwalkar. “a system for massively parallel hyperparameter tuning.” arxiv:1810.05934v5 [cs], march 16, 2020. .
extended capabilities
automatic parallel support
accelerate code by automatically running computation in parallel using parallel computing toolbox™.
to perform parallel hyperparameter optimization, use the
"hyperparameteroptimizationoptions",struct("useparallel",true)
name-value argument in the call to this function.
for more general information about parallel computing, see (parallel computing toolbox).
version history
introduced in r2020ar2023a: neural network classifiers support misclassification costs and prior probabilities
starting in r2023a, fitcauto
supports misclassification costs and
prior probabilities for neural network classifiers. that is, you can specify the
cost
and prior
name-value arguments when the
learners
name-value argument includes "net"
models.
in previous releases, when you specified nondefault misclassification costs or prior
probabilities, fitcauto
omitted neural network models from the model
selection process.
r2022a: learners include neural network models
starting in r2022a, the list of available learners includes neural network models. when you
specify "all"
or "all-nonlinear"
for the
learners
name-value argument, fitcauto
includes neural network models as part of the model selection and hyperparameter tuning
process. the function also considers neural network models when you specify
learners
as "auto"
, depending on the
characteristics of your data set.
to omit neural network models from the model selection process, you can explicitly specify the
models you want to include. for example, to use tree and ensemble models only, specify
"learners",["tree","ensemble"]
.
r2022a: automatic selection of learners includes linear models when data is wide after categorical expansion
starting in r2022a, if you specify learners
as
"auto"
and the data has more predictors than observations after the
expansion of the categorical predictors (see automatic creation of dummy variables), then
fitcauto
includes linear learners ("linear"
)
along with other models during the hyperparameter optimization. in previous releases, linear
learners were not considered.
r2022a: regularization method determines the linear learner solver used during the optimization process for multiclass classification
starting in r2022a, when you specify to try a linear learner
("linear"
) for multiclass classification,
fitcauto
uses either a limited-memory bfgs (lbfgs) solver or a
sparse reconstruction by separable approximation (sparsa) solver, depending on the
regularization type selected during that iteration of the optimization process.
when
regularization
is'ridge'
, the function sets thesolver
value to'lbfgs'
by default.when
regularization
is'lasso'
, the function sets thesolver
value to'sparsa'
by default.
in previous releases, the default solver selection during the optimization process
depended on various factors, including the regularization type, learner type, and number of
predictors. for more information, see solver
.
r2021a: regularization method determines the linear learner solver used during the optimization process for binary classification
starting in r2021a, when you specify to try a linear learner
("linear"
) for binary classification, fitcauto
uses either a limited-memory bfgs (lbfgs) solver or a sparse reconstruction by separable
approximation (sparsa) solver, depending on the regularization type selected during that
iteration of the optimization process.
when
regularization
is'ridge'
, the function sets thesolver
value to'lbfgs'
by default.when
regularization
is'lasso'
, the function sets thesolver
value to'sparsa'
by default.
in previous releases, the default solver selection during the optimization process
depended on various factors, including the regularization type, learner type, and number of
predictors. for more information, see solver
.
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
select a web site
choose a web site to get translated content where available and see local events and offers. based on your location, we recommend that you select: .
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.
americas
- (español)
- (english)
- (english)
europe
- (english)
- (english)
- (deutsch)
- (español)
- (english)
- (français)
- (english)
- (italiano)
- (english)
- (english)
- (english)
- (deutsch)
- (english)
- (english)
- switzerland
- (english)
asia pacific
- (english)
- (english)
- (english)
- 中国
- (日本語)
- (한국어)