regression ensemble grown by resampling -凯发k8网页登录

package: classreg.learning.regr
superclasses:

regression ensemble grown by resampling

description

regressionbaggedensemble combines a set of trained weak learner models and data on which these learners were trained. it can predict ensemble response for new data by aggregating predictions from its weak learners.

construction

create a bagged regression ensemble object using . set the name-value pair argument 'method' of fitrensemble to 'bag' to use bootstrap aggregation (bagging, for example, random forest).

properties

`binedges`	bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. each vector includes the bin edges for a numeric predictor. the element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. the software bins numeric predictors only if you specify the `'numbins'` name-value argument as a positive integer scalar when training a model with tree learners. the `binedges` property is empty if the `'numbins'` value is empty (default). you can reproduce the binned predictor data `xbinned` by using the `binedges` property of the trained model `mdl`. x = mdl.x; % predictor data xbinned = zeros(size(x)); edges = mdl.binedges; % find indices of binned predictors. idxnumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxnumeric) idxnumeric = idxnumeric'; end for j = idxnumeric x = x(:,j); % convert x to array if x is a table. if istable(x) x = table2array(x); end % group x into bins by using the function. xbinned = discretize(x,[-inf; edges{j}; inf]); xbinned(:,j) = xbinned; end `xbinned` contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. `xbinned` values are 0 for categorical predictors. if `x` contains `nan`s, then the corresponding `xbinned` values are `nan`s.
`categoricalpredictors`	categorical predictor indices, specified as a vector of positive integers. `categoricalpredictors` contains index values indicating that the corresponding predictors are categorical. the index values are between 1 and `p`, where `p` is the number of predictors used to train the model. if none of the predictors are categorical, then this property is empty (`[]`).
`combineweights`	a character vector describing how the ensemble combines learner predictions.
`expandedpredictornames`	expanded predictor names, stored as a cell array of character vectors. if the model uses encoding for categorical variables, then `expandedpredictornames` includes the names that describe the expanded variables. otherwise, `expandedpredictornames` is the same as `predictornames`.
`fitinfo`	a numeric array of fit information. the `fitinfodescription` property describes the content of this array.
`fitinfodescription`	character vector describing the meaning of the `fitinfo` array.
`fresample`	a numeric scalar between `0` and `1`. `fresample` is the fraction of training data resampled at random for every weak learner when constructing the ensemble.
`hyperparameteroptimizationresults`	description of the cross-validation optimization of hyperparameters, stored as a `bayesianoptimization` object or a table of hyperparameters and associated values. nonempty when the `optimizehyperparameters` name-value pair is nonempty at creation. value depends on the setting of the `hyperparameteroptimizationoptions` name-value pair at creation: `'bayesopt'` (default) — object of class `bayesianoptimization` `'gridsearch'` or `'randomsearch'` — table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)
`learnernames`	cell array of character vectors with names of the weak learners in the ensemble. the name of each learner appears just once. for example, if you have an ensemble of 100 trees, `learnernames` is `{'tree'}`.
`method`	a character vector with the name of the algorithm used for training the ensemble.
`modelparameters`	parameters used in training `ens`.
`numobservations`	numeric scalar containing the number of observations in the training data.
`numtrained`	number of trained learners in the ensemble, a positive scalar.
`predictornames`	a cell array of names for the predictor variables, in the order in which they appear in `x`.
`reasonfortermination`	a character vector describing the reason stopped adding weak learners to the ensemble.
`regularization`	a structure containing the result of the method. use `regularization` with to lower resubstitution error and shrink the ensemble.
`replace`	boolean flag indicating if training data for weak learners in this ensemble were sampled with replacement. `replace` is `true` for sampling with replacement, `false` otherwise.
`responsename`	a character vector with the name of the response variable `y`.
`responsetransform`	function handle for transforming scores, or character vector representing a built-in transformation function. `'none'` means no transformation; equivalently, `'none'` means `@(x)x`. add or change a `responsetransform` function using dot notation: ens.responsetransform = @function
`trained`	the trained learners, a cell array of compact regression models.
`trainedweights`	a numeric vector of weights the ensemble assigns to its learners. the ensemble computes predicted response by aggregating weighted predictions from its learners.
`useobsforlearner`	a logical matrix of size `n`-by-`numtrained`, where `n` is the number of rows (observations) in the training data `x`, and `numtrained` is the number of trained weak learners. `useobsforlearner(i,j)` is `true` if observation `i` was used for training learner `j`, and is `false` otherwise.
`w`	the scaled `weights`, a vector with length `n`, the number of rows in `x`. the sum of the elements of `w` is `1`.
`x`	the matrix or table of predictor values that trained the ensemble. each column of `x` represents one variable, and each row represents one observation.
`y`	the numeric column vector with the same number of rows as `x` that trained the ensemble. each entry in `y` is the response to the data in the corresponding row of `x`.

object functions

	create compact regression ensemble
	cross validate ensemble
	cross-validate shrinking (pruning) ensemble
	gather properties of statistics and machine learning toolbox object from gpu
`lime`	local interpretable model-agnostic explanations (lime)
	regression error
	out-of-bag regression error
`oobpermutedpredictorimportance`	predictor importance estimates by permutation of out-of-bag predictor observations for random forest of regression trees
	predict out-of-bag response of ensemble
`partialdependence`	compute partial dependence
`plotpartialdependence`	create partial dependence plot (pdp) and individual conditional expectation (ice) plots
	predict responses using ensemble of regression models
	estimates of predictor importance for regression ensemble
	find weights to minimize resubstitution error plus penalty term
	remove members of compact regression ensemble
	regression error by resubstitution
	predict response of ensemble by resubstitution
	resume training ensemble
`shapley`	shapley values
	prune ensemble

copy semantics

value. to learn how value classes affect copy operations, see .

examples

train bagged ensemble of regression trees

load the carsmall data set. consider a model that explains a car's fuel economy (mpg) using its weight (weight) and number of cylinders (cylinders).

load carsmall
x = [weight cylinders];
y = mpg;

train a bagged ensemble of 100 regression trees using all measurements.

mdl = fitrensemble(x,y,'method','bag')

mdl = 
  regressionbaggedensemble
             responsename: 'y'
    categoricalpredictors: []
        responsetransform: 'none'
          numobservations: 94
               numtrained: 100
                   method: 'bag'
             learnernames: {'tree'}
     reasonfortermination: 'terminated normally after completing the requested number of training cycles.'
                  fitinfo: []
       fitinfodescription: 'none'
           regularization: []
                fresample: 1
                  replace: 1
         useobsforlearner: [94x100 logical]
  properties, methods

mdl is a regressionbaggedensemble model object.

mdl.trained is the property that stores a 100-by-1 cell vector of the trained, compact regression trees (compactregressiontree model objects) that compose the ensemble.

plot a graph of the first trained regression tree.

view(mdl.trained{1},'mode','graph')

figure regression tree viewer contains an axes object and other objects of type uimenu, uicontrol. the axes object contains 24 objects of type line, text. one or more of the lines displays its values using only markers

by default, fitrensemble grows deep trees for bags of trees.

estimate the in-sample mean-squared error (mse).

l = resubloss(mdl)

l = 12.4048

tips

for a bagged ensemble of regression trees, the trained property of ens stores a cell vector of ens.numtrained compactregressiontree model objects. for a textual or graphical display of tree t in the cell vector, enter

view(ens.trained{t})

extended capabilities

c/c code generation
generate c and c code using matlab® coder™.

usage notes and limitations:

the function supports code generation.
to integrate the prediction of an ensemble into simulink^®, you can use the block in the statistics and machine learning toolbox™ library or a matlab^® function block with the predict function.
when you train an ensemble by using , the following restrictions apply.
- the value of the name-value argument cannot be an anonymous function.
- code generation limitations for regression trees also apply to ensembles of regression trees. you cannot use surrogate splits; that is, the value of the name-value argument must be 'off'.
for fixed-point code generation, the following additional restrictions apply.
- when you train an ensemble by using , the value of the name-value argument must be 'none' (default).
- categorical predictors (logical, categorical, char, string, or cell) are not supported. you cannot use the categoricalpredictors name-value argument. to include categorical predictors in a model, preprocess them by using before fitting the model.

for more information, see introduction to code generation.

version history

introduced in r2011a