compact naive bayes classifier for multiclass classification

description

compactclassificationnaivebayes is a compact version of the naive bayes classifier. the compact classifier does not include the data used for training the naive bayes classifier. therefore, you cannot perform some tasks, such as cross-validation, using the compact classifier. use a compact naive bayes classifier for tasks such as predicting the labels of the data.

creation

create a compactclassificationnaivebayes model from a full, trained classifier by using .

properties

predictor properties

`predictornames` — predictor names
cell array of character vectors

this property is read-only.

predictor names, specified as a cell array of character vectors. the order of the elements in predictornames corresponds to the order in which the predictor names appear in the training data x.

`expandedpredictornames` — expanded predictor names
cell array of character vectors

this property is read-only.

expanded predictor names, specified as a cell array of character vectors.

if the model uses dummy variable encoding for categorical variables, then expandedpredictornames includes the names that describe the expanded variables. otherwise, expandedpredictornames is the same as predictornames.

`categoricalpredictors` — categorical predictor indices
vector of positive integers | `[]`

this property is read-only.

categorical predictor indices, specified as a vector of positive integers. categoricalpredictors contains index values indicating that the corresponding predictors are categorical. the index values are between 1 and p, where p is the number of predictors used to train the model. if none of the predictors are categorical, then this property is empty ([]).

data types: single | double

`categoricallevels` — multivariate multinomial levels
cell array

this property is read-only.

multivariate multinomial levels, specified as a cell array. the length of categoricallevels is equal to the number of predictors (size(x,2)).

the cells of categoricallevels correspond to predictors that you specify as 'mvmn' during training, that is, they have a multivariate multinomial distribution. cells that do not correspond to a multivariate multinomial distribution are empty ([]).

if predictor j is multivariate multinomial, then categoricallevels{j} is a list of all distinct values of predictor j in the sample. nans are removed from unique(x(:,j)).

predictor distribution properties

`distributionnames` — predictor distributions
`'normal'` (default) | `'kernel'` | `'mn'` | `'mvmn'` | cell array of character vectors

this property is read-only.

predictor distributions, specified as a character vector or cell array of character vectors. fitcnb uses the predictor distributions to model the predictors. this table lists the available distributions.

value	description
`'kernel'`	kernel smoothing density estimate
`'mn'`	multinomial distribution. if you specify `mn`, then all features are components of a multinomial distribution. therefore, you cannot include `'mn'` as an element of a string array or a cell array of character vectors. for details, see .
`'mvmn'`	multivariate multinomial distribution. for details, see .
`'normal'`	normal (gaussian) distribution

if distributionnames is a 1-by-p cell array of character vectors, then fitcnb models the feature j using the distribution in element j of the cell array.

example: 'mn'

example: {'kernel','normal','kernel'}

data types: char | string | cell

`distributionparameters` — distribution parameter estimates
cell array

this property is read-only.

distribution parameter estimates, specified as a cell array. distributionparameters is a k-by-d cell array, where cell (k,d) contains the distribution parameter estimates for instances of predictor d in class k. the order of the rows corresponds to the order of the classes in the property classnames, and the order of the predictors corresponds to the order of the columns of x.

if class k has no observations for predictor j, then the distribution{k,j} is empty ([]).

the elements of distributionparameters depend on the distributions of the predictors. this table describes the values in distributionparameters{k,j}.

distribution of predictor j	value of cell array for predictor `j` and class `k`
`kernel`	a model. display properties using cell indexing and dot notation. for example, to display the estimated bandwidth of the kernel density for predictor 2 in the third class, use `mdl.distributionparameters{3,2}.bandwidth`.
`mn`	a scalar representing the probability that token j appears in class k. for details, see .
`mvmn`	a numeric vector containing the probabilities for each possible level of predictor j in class k. the software orders the probabilities by the sorted order of all unique levels of predictor j (stored in the property `categoricallevels`). for more details, see .
`normal`	a 2-by-1 numeric vector. the first element is the sample mean and the second element is the sample standard deviation. for more details, see

`kernel` — kernel smoother type
`'normal'` (default) | `'box'` | cell array | ...

this property is read-only.

kernel smoother type, specified as the name of a kernel or a cell array of kernel names. the length of kernel is equal to the number of predictors (size(x,2)). kernel{j} corresponds to predictor j and contains a character vector describing the type of kernel smoother. if a cell is empty ([]), then fitcnb did not fit a kernel distribution to the corresponding predictor.

this table describes the supported kernel smoother types. i{u} denotes the indicator function.

value	kernel	formula
`'box'`	box (uniform)	$f (x) = 0.5 i {\| x \| \leq 1}$
`'epanechnikov'`	epanechnikov	$f (x) = 0.75 (1 - x^{2}) i {\| x \| \leq 1}$
`'normal'`	gaussian	$f (x) = \frac{1}{\sqrt{2 π}} \exp (- 0.5 x^{2})$
`'triangle'`	triangular	$f (x) = (1 - \| x \|) i {\| x \| \leq 1}$

example: 'box'

example: {'epanechnikov','normal'}

data types: char | string | cell

`support` — kernel smoother density support
cell array

this property is read-only.

kernel smoother density support, specified as a cell array. the length of support is equal to the number of predictors (size(x,2)). the cells represent the regions to which fitcnb applies the kernel density. if a cell is empty ([]), then fitcnb did not fit a kernel distribution to the corresponding predictor.

this table describes the supported options.

value	description
1-by-2 numeric row vector	the density support applies to the specified bounds, for example `[l,u]`, where `l` and `u` are the finite lower and upper bounds, respectively.
`'positive'`	the density support applies to all positive real values.
`'unbounded'`	the density support applies to all real values.

`width` — kernel smoother window width
numeric matrix

this property is read-only.

kernel smoother window width, specified as a numeric matrix. width is a k-by-p matrix, where k is the number of classes in the data, and p is the number of predictors (size(x,2)).

width(k,j) is the kernel smoother window width for the kernel smoothing density of predictor j within class k. nans in column j indicate that fitcnb did not fit predictor j using a kernel density.

response properties

`classnames` — unique class names
categorical array | character array | logical vector | numeric vector | cell array of character vectors

this property is read-only.

unique class names used in the training model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors.

classnames has the same data type as y, and has k elements (or rows) for character arrays. (the software treats string arrays as cell arrays of character vectors.)

`responsename` — response variable name
character vector

this property is read-only.

response variable name, specified as a character vector.

data types: char | string

training properties

`prior` — prior probabilities
numeric vector

prior probabilities, specified as a numeric vector. the order of the elements in prior corresponds to the elements of mdl.classnames.

fitcnb normalizes the prior probabilities you set using the 'prior' name-value pair argument, so that sum(prior) = 1.

the value of prior does not affect the best-fitting model. therefore, you can reset prior after training mdl using dot notation.

example: mdl.prior = [0.2 0.8]

data types: double | single

classifier properties

`cost` — misclassification cost
square matrix

misclassification cost, specified as a numeric square matrix, where cost(i,j) is the cost of classifying a point into class j if its true class is i. the rows correspond to the true class and the columns correspond to the predicted class. the order of the rows and columns of cost corresponds to the order of the classes in classnames.

the misclassification cost matrix must have zeros on the diagonal.

the value of cost does not influence training. you can reset cost after training mdl using dot notation.

example: mdl.cost = [0 0.5 ; 1 0]

data types: double | single

`scoretransform` — classification score transformation
`'none'` (default) | `'doublelogit'` | `'invlogit'` | `'ismax'` | `'logit'` | function handle | ...

classification score transformation, specified as a character vector or function handle. this table summarizes the available character vectors.

value	description
`"doublelogit"`	1/(1 e^–2x)
`"invlogit"`	log(x / (1 – x))
`"ismax"`	sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
`"logit"`	1/(1 e^–x)
`"none"` or `"identity"`	x (no transformation)
`"sign"`	–1 for x < 0 0 for x = 0 1 for x > 0
`"symmetric"`	2x – 1
`"symmetricismax"`	sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
`"symmetriclogit"`	2/(1 e^–x) – 1

for a matlab^® function or a function you define, use its function handle for the score transformation. the function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

example: mdl.scoretransform = 'logit'

data types: char | string | function handle

object functions

	compare accuracies of two classification models using new data
	classification edge for naive bayes classifier
`lime`	local interpretable model-agnostic explanations (lime)
	log unconditional probability density for naive bayes classifier
	classification loss for naive bayes classifier
	classification margins for naive bayes classifier
`partialdependence`	compute partial dependence
`plotpartialdependence`	create partial dependence plot (pdp) and individual conditional expectation (ice) plots
	classify observations using naive bayes classifier
`shapley`	shapley values

examples

reduce size of naive bayes classifier

reduce the size of a full naive bayes classifier by removing the training data. full naive bayes classifiers hold the training data. you can use a compact naive bayes classifier to improve memory efficiency.

load the ionosphere data set. remove the first two predictors for stability.

load ionosphere
x = x(:,3:end);

train a naive bayes classifier using the predictors x and class labels y. a recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed.

mdl = fitcnb(x,y,'classnames',{'b','g'})

mdl = 
  classificationnaivebayes
              responsename: 'y'
     categoricalpredictors: []
                classnames: {'b'  'g'}
            scoretransform: 'none'
           numobservations: 351
         distributionnames: {1x32 cell}
    distributionparameters: {2x32 cell}
  properties, methods

mdl is a trained classificationnaivebayes classifier.

reduce the size of the naive bayes classifier.

cmdl = compact(mdl)

cmdl = 
  compactclassificationnaivebayes
              responsename: 'y'
     categoricalpredictors: []
                classnames: {'b'  'g'}
            scoretransform: 'none'
         distributionnames: {1x32 cell}
    distributionparameters: {2x32 cell}
  properties, methods

cmdl is a trained compactclassificationnaivebayes classifier.

display the amount of memory used by each classifier.

whos('mdl','cmdl')

  name      size             bytes  class                                                        attributes
  cmdl      1x1              15060  classreg.learning.classif.compactclassificationnaivebayes              
  mdl       1x1             111190  classificationnaivebayes

the full naive bayes classifier (mdl) is more than seven times larger than the compact naive bayes classifier (cmdl).

to label new observations efficiently, you can remove mdl from the matlab® workspace, and then pass cmdl and new predictor values to .

train and cross-validate naive bayes classifier

train and cross-validate a naive bayes classifier. fitcnb implements 10-fold cross-validation by default. then, estimate the cross-validated classification error.

load the ionosphere data set. remove the first two predictors for stability.

load ionosphere
x = x(:,3:end);
rng('default')  % for reproducibility

train and cross-validate a naive bayes classifier using the predictors x and class labels y. a recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed.

cvmdl = fitcnb(x,y,'classnames',{'b','g'},'crossval','on')

cvmdl = 
  classificationpartitionedmodel
    crossvalidatedmodel: 'naivebayes'
         predictornames: {'x1'  'x2'  'x3'  'x4'  'x5'  'x6'  'x7'  'x8'  'x9'  'x10'  'x11'  'x12'  'x13'  'x14'  'x15'  'x16'  'x17'  'x18'  'x19'  'x20'  'x21'  'x22'  'x23'  'x24'  'x25'  'x26'  'x27'  'x28'  'x29'  'x30'  'x31'  'x32'}
           responsename: 'y'
        numobservations: 351
                  kfold: 10
              partition: [1x1 cvpartition]
             classnames: {'b'  'g'}
         scoretransform: 'none'
  properties, methods

cvmdl is a classificationpartitionedmodel cross-validated, naive bayes classifier. alternatively, you can cross-validate a trained classificationnaivebayes model by passing it to .

display the first training fold of cvmdl using dot notation.

cvmdl.trained{1}

ans = 
  compactclassificationnaivebayes
              responsename: 'y'
     categoricalpredictors: []
                classnames: {'b'  'g'}
            scoretransform: 'none'
         distributionnames: {1x32 cell}
    distributionparameters: {2x32 cell}
  properties, methods

each fold is a compactclassificationnaivebayes model trained on 90% of the data.

full and compact naive bayes models are not used for predicting on new data. instead, use them to estimate the generalization error by passing cvmdl to kfoldloss.

generror = kfoldloss(cvmdl)

generror = 0.1852

on average, the generalization error is approximately 19%.

you can specify a different conditional distribution for the predictors, or tune the conditional distribution parameters to reduce the generalization error.

more about

bag-of-tokens model

in the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in the observation. the number of categories (bins) in the multinomial model is the number of distinct tokens (number of predictors).

naive bayes

naive bayes is a classification algorithm that applies density estimation to the data.

the algorithm leverages bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. although the assumption is usually violated in practice, naive bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1].

naive bayes classifiers assign observations to the most probable class (in other words, the maximum a posteriori decision rule). explicitly, the algorithm takes these steps:

estimate the densities of the predictors within each class.
model posterior probabilities according to bayes rule. that is, for all k = 1,...,k,

$\hat{p} (y = k | x_{1}, .., x_{p}) = \frac{π (y = k) \prod_{j = 1}^{p} p (x_{j} | y = k)}{\sum_{k = 1}^{k} π (y = k) \prod_{j = 1}^{p} p (x_{j} | y = k)},$
where:
- y is the random variable corresponding to the class index of an observation.
- x₁,...,x_p are the random predictors of an observation.
- $π (y = k)$ is the prior probability that a class index is k.
classify an observation by estimating the posterior probability for each class, and then assign the observation to the class yielding the maximum posterior probability.

if the predictors compose a multinomial distribution, then the posterior probability $\hat{p} (y = k | x_{1}, .., x_{p}) \propto π (y = k) p_{m n} (x_{1}, ..., x_{p} | y = k),$ where $p_{m n} (x_{1}, ..., x_{p} | y = k)$ is the probability mass function of a multinomial distribution.

algorithms

normal distribution estimators

if predictor variable j has a conditional normal distribution (see the distributionnames property), the software fits the distribution to the data by computing the class-specific weighted mean and the unbiased estimate of the weighted standard deviation. for each class k:

the weighted mean of predictor j is

${\bar{x}}_{j | k} = \frac{\sum_{{i : y_{i} = k}} w_{i} x_{i j}}{\sum_{{i : y_{i} = k}} w_{i}},$
where w_i is the weight for observation i. the software normalizes weights within a class such that they sum to the prior probability for that class.
the unbiased estimator of the weighted standard deviation of predictor j is

$s_{j | k} = {[\frac{\sum_{{i : y_{i} = k}} w_{i} {(x_{i j} - {\bar{x}}_{j | k})}^{2}}{z_{1 | k} - \frac{z_{2 | k}}{z_{1 | k}}}]}^{1 / 2},$
where z_1|k is the sum of the weights within class k and z_2|k is the sum of the squared weights within class k.

estimated probability for multinomial distribution

if all predictor variables compose a conditional multinomial distribution (see the distributionnames property), the software fits the distribution using the . the software stores the probability that token j appears in class k in the property distributionparameters{k,j}. with additive smoothing , the estimated probability is

$p (token j | class k) = \frac{1 c_{j | k}}{p c_{k}},$

where:

$c_{j | k} = n_{k} \frac{\sum_{{i : y_{i} = k}}^{} x_{i j} w_{i}^{}}{\sum_{{i : y_{i} = k}}^{} w_{i}},$ which is the weighted number of occurrences of token j in class k.
n_k is the number of observations in class k.
$w_{i}^{}$ is the weight for observation i. the software normalizes weights within a class so that they sum to the prior probability for that class.
$c_{k} = \sum_{j = 1}^{p} c_{j | k},$ which is the total weighted number of occurrences of all tokens in class k.

estimated probability for multivariate multinomial distribution

if predictor variable j has a conditional multivariate multinomial distribution (see the distributionnames property), the software follows this procedure:

the software collects a list of the unique levels, stores the sorted list in categoricallevels, and considers each level a bin. each combination of predictor and class is a separate, independent multinomial random variable.
for each class k, the software counts instances of each categorical level using the list stored in categoricallevels{j}.
the software stores the probability that predictor j in class k has level l in the property distributionparameters{k,j}, for all levels in categoricallevels{j}. with additive smoothing , the estimated probability is

$p (predictor j = l | class k) = \frac{1 m_{j | k} (l)}{m_{j} m_{k}},$
where:
- $m_{j | k} (l) = n_{k} \frac{\sum_{{i : y_{i} = k}}^{} i {x_{i j} = l} w_{i}^{}}{\sum_{{i : y_{i} = k}}^{} w_{i}^{}},$ which is the weighted number of observations for which predictor j equals l in class k.
- n_k is the number of observations in class k.
- $i {x_{i j} = l} = 1$ if x_ij = l, and 0 otherwise.
- $w_{i}^{}$ is the weight for observation i. the software normalizes weights within a class so that they sum to the prior probability for that class.
- m_j is the number of distinct levels in predictor j.
- m_k is the weighted number of observations in class k.

references

[1] hastie, trevor, robert tibshirani, and jerome friedman. the elements of statistical learning: data mining, inference, and prediction. 2nd ed. springer series in statistics. new york, ny: springer, 2009. https://doi.org/10.1007/978-0-387-84858-7.

[2] manning, christopher d., prabhakar raghavan, and hinrich schütze. introduction to information retrieval, ny: cambridge university press, 2008.

extended capabilities

c/c code generation
generate c and c code using matlab® coder™.

usage notes and limitations:

the function supports code generation.
when you train a naive bayes model by using fitcnb, the following restrictions apply.
- the value of the 'distributionnames' name-value pair argument cannot contain 'mn'.
- the value of the 'scoretransform' name-value pair argument cannot be an anonymous function.

for more information, see introduction to code generation.

version history

introduced in r2014b

compact naive bayes classifier for multiclass classification -凯发k8网页登录

description

creation

properties

predictor properties

`predictornames` — predictor names
cell array of character vectors

`expandedpredictornames` — expanded predictor names
cell array of character vectors

`categoricalpredictors` — categorical predictor indices
vector of positive integers | `[]`

`categoricallevels` — multivariate multinomial levels
cell array

predictor distribution properties

`distributionnames` — predictor distributions
`'normal'` (default) | `'kernel'` | `'mn'` | `'mvmn'` | cell array of character vectors

`distributionparameters` — distribution parameter estimates
cell array

`kernel` — kernel smoother type
`'normal'` (default) | `'box'` | cell array | ...

`support` — kernel smoother density support
cell array

`width` — kernel smoother window width
numeric matrix

response properties

`classnames` — unique class names
categorical array | character array | logical vector | numeric vector | cell array of character vectors

`responsename` — response variable name
character vector

training properties

`prior` — prior probabilities
numeric vector

classifier properties

`cost` — misclassification cost
square matrix

`scoretransform` — classification score transformation
`'none'` (default) | `'doublelogit'` | `'invlogit'` | `'ismax'` | `'logit'` | function handle | ...

object functions

examples

reduce size of naive bayes classifier

train and cross-validate naive bayes classifier

more about

bag-of-tokens model

naive bayes

algorithms

normal distribution estimators

estimated probability for multinomial distribution

estimated probability for multivariate multinomial distribution

references

extended capabilities

c/c code generation
generate c and c code using matlab® coder™.

version history

see also

topics

compact naive bayes classifier for multiclass classification -凯发k8网页登录

description

creation

properties

predictor properties

predictornames — predictor names cell array of character vectors

expandedpredictornames — expanded predictor names cell array of character vectors

categoricalpredictors — categorical predictor indices vector of positive integers | []

categoricallevels — multivariate multinomial levels cell array

predictor distribution properties

distributionnames — predictor distributions 'normal' (default) | 'kernel' | 'mn' | 'mvmn' | cell array of character vectors

distributionparameters — distribution parameter estimates cell array

kernel — kernel smoother type 'normal' (default) | 'box' | cell array | ...

support — kernel smoother density support cell array

width — kernel smoother window width numeric matrix

response properties

classnames — unique class names categorical array | character array | logical vector | numeric vector | cell array of character vectors

responsename — response variable name character vector

training properties

prior — prior probabilities numeric vector

classifier properties

cost — misclassification cost square matrix

scoretransform — classification score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ...

object functions

examples

reduce size of naive bayes classifier

train and cross-validate naive bayes classifier

more about

bag-of-tokens model

naive bayes

algorithms

normal distribution estimators

estimated probability for multinomial distribution

estimated probability for multivariate multinomial distribution

references

extended capabilities

c/c code generation generate c and c code using matlab® coder™.

version history

see also

topics

wechat

`predictornames` — predictor names
cell array of character vectors

`expandedpredictornames` — expanded predictor names
cell array of character vectors

`categoricalpredictors` — categorical predictor indices
vector of positive integers | `[]`

`categoricallevels` — multivariate multinomial levels
cell array

`distributionnames` — predictor distributions
`'normal'` (default) | `'kernel'` | `'mn'` | `'mvmn'` | cell array of character vectors

`distributionparameters` — distribution parameter estimates
cell array

`kernel` — kernel smoother type
`'normal'` (default) | `'box'` | cell array | ...

`support` — kernel smoother density support
cell array

`width` — kernel smoother window width
numeric matrix

`classnames` — unique class names
categorical array | character array | logical vector | numeric vector | cell array of character vectors

`responsename` — response variable name
character vector

`prior` — prior probabilities
numeric vector

`cost` — misclassification cost
square matrix

`scoretransform` — classification score transformation
`'none'` (default) | `'doublelogit'` | `'invlogit'` | `'ismax'` | `'logit'` | function handle | ...

c/c code generation
generate c and c code using matlab® coder™.