what is a machine learning model?
a machine learning model is a program that is used to make predictions for a given data set. a machine learning model is built by a supervised machine learning algorithm and uses computational methods to “learn” information directly from data without relying on a predetermined equation. more specifically, the algorithm takes a known set of input data and known responses to the data (output) and trains the machine learning model to generate reasonable predictions for the response to new data.
types of machine learning models
there are two main types of machine learning models: machine learning classification (where the response belongs to a set of classes) and machine learning regression (where the response is continuous).
choosing the right machine learning model can seem overwhelming—there are dozens of classification and regression models, and each takes a different approach to learning. this process requires evaluating tradeoffs, such as model speed, accuracy, and complexity, and can involve trial and error.
the following is an overview of machine learning classification and regression machine learning models to help you get started.
popular machine learning models for regression
model | image | how it works | matlab function | examples and how to |
---|---|---|---|---|
linear regression | linear regression is a statistical modeling technique used to describe a continuous response variable as a linear function of one or more predictor variables. because linear regression models are simple to interpret and easy to train, they are often the first models to be fitted to a new data set. | - documentation - code example |
||
nonlinear regression | nonlinear regression is a statistical modeling technique that helps describe nonlinear relationships in experimental data. nonlinear regression models are generally assumed to be parametric, where the model is described as a nonlinear equation. “nonlinear” refers to a fit function that is a nonlinear function of the parameters. for example, if the fitting parameters are b0, b1, and b2: the equation y = b0 b1x b2x2 is a linear function of the fitting parameters, whereas y = (b0xb1)/(x b2) is a nonlinear function of the fitting parameters. |
- documentation - code example |
||
gaussian process regression (gpr) | gpr models are nonparametric machine learning models that are used for predicting the value of a continuous response variable. the response variable is modeled as a gaussian process, using covariances with the input variables. these models are widely used in the field of spatial analysis for interpolation in the presence of uncertainty. gpr is also referred to as kriging. |
fitrgp |
- documentation fitting a gaussian process machine learning model - code example |
|
support vector machine (svm) regression | svm regression algorithms work like svm classification algorithms but are modified to be able to predict a continuous response. instead of finding a hyperplane that separates data, svm regression algorithms find a model that deviates from the measured data by a value no greater than a small amount, with parameter values that are as small as possible (to minimize sensitivity to error). | fitrsvm |
- documentation fitting an svm machine learning model - code example |
|
generalized linear model | a generalized linear model (glm) is a special case of nonlinear models that uses linear methods. it involves fitting a linear combination of the inputs to a nonlinear function (the link function) of the outputs. the logistic regression model is an example of a glm. | - documentation - code example |
||
regression tree | decision trees for regression are similar to decision trees for classification, but they are modified to be able to predict continuous responses. | fitrtree |
- documentation fitting a regression tree machine learning model - code example |
|
generative additive model (gam) | gam models explain a response variable using a sum of univariate and bivariate shape functions of predictors. they use a boosted tree as a shape function for each predictor and, optionally, each pair of predictors; therefore, the function can capture a nonlinear relation between a predictor and the response variable. | - documentation - code example |
||
neural network (shallow) |
inspired by the human brain, a neural network consists of highly connected networks of neurons that relate the inputs to the desired outputs. the network is trained by iteratively modifying the strengths of the connections so that training inputs map to the training responses. | - documentation - code example |
||
neural network (deep) | deep neural networks have more hidden layers than shallow neural networks, with some instances having hundreds of hidden layers. deep neural networks can be configured to solve regression problems by placing a regression output layer at the end of the network. | deep learning in matlab - documentation - code example |
||
regression tree ensembles | in ensemble methods, several “weaker” regression trees are combined into a “stronger” ensemble. the final model uses a combination of predictions from the “weaker” regression trees to calculate the final prediction. | - documentation - code example |
popular machine learning models for classification
model | image | how it works | matlab function | further reading |
---|---|---|---|---|
decision tree | a decision tree lets you predict responses to data by following the decisions in the tree from the root (beginning) down to a leaf node. a tree consists of branching conditions where the value of a predictor is compared to a trained weight. the number of branches and the values of weights are determined in the training process. additional modification, or pruning, may be used to simplify the model. | fitctree |
- documentation fitting a decision tree machine learning model - code example |
|
k-nearest neighbor (knn) | knn is a type of machine learning model that categorizes objects based on the classes of their nearest neighbors in the data set. knn predictions assume that objects near each other are similar. distance metrics, such as euclidean, city block, cosine, and chebyshev, are used to find the nearest neighbor. | fitcknn |
- documentation fitting a k-nearest neighbor machine learning model - code example |
|
support vector machine (svm) | an svm classifies data by finding the linear decision boundary (hyperplane) that separates all data points of one class from those of the other class. the best hyperplane for an svm is the one with the largest margin between the two classes, when the data is linearly separable. if the data is not linearly separable, a loss function is used to penalize points on the wrong side of the hyperplane. svms sometimes use a kernel transform to transform nonlinearly separable data into higher dimensions where a linear decision boundary can be found. | fitcsvm |
- documentation fitting an svm machine learning model code example |
|
generative additive model (gam) | gam models explain class scores using a sum of univariate and bivariate shape functions of predictors. they use a boosted tree as a shape function for each predictor and, optionally, each pair of predictors; therefore, the function can capture a nonlinear relation between a predictor and the response variable. | - documentation train generalized additive model for binary classification - code example |
||
neural network (shallow) | inspired by the human brain, a neural network consists of highly connected networks of neurons that relate the inputs to the desired outputs. the machine learning model is trained by iteratively modifying the strengths of the connections so that given inputs map to the correct response. the neurons in between the input and output layers of a neural network are said to be in “hidden layers.” shallow neural networks typically have one to two hidden layers. | fitcnet |
neural network architectures - documentation - code example |
|
neural network (deep) | deep neural networks have more hidden layers than shallow neural networks, with some instances having hundreds of hidden layers. deep neural networks can be configured to solve classification problems by placing a classification output layer at the end of the network. many pretrained deep learning models for classification are publicly available for tasks such as image recognition. | deep learning in matlab - documentation - code example |
||
bagged and boosted decision trees | in these ensemble methods, several “weaker” decision trees are combined into a “stronger” ensemble. a bagged decision tree consists of trees that are trained independently on data that is bootstrapped from the input data. boosting involves creating a strong learner by iteratively adding “weak” learners and adjusting the weight of each “weak” learner to focus on misclassified examples. |
fitcensemble |
- documentation fitting a boosted decision tree ensemble - code example |
|
naive bayes | a naive bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. it classifies new data based on the highest probability of its belonging to a particular class. | fitcnb |
- documentation fitting a naive bayes machine learning model - code example |
|
discriminant analysis | discriminant analysis classifies data by finding linear combinations of features. discriminant analysis assumes that different classes generate data based on gaussian distributions. training a discriminant analysis model involves finding the parameters for a gaussian distribution for each class. the distribution parameters are used to calculate boundaries, which can be linear or quadratic functions. these boundaries are used to determine the class of new data. | fitcdiscr |
- documentation fitting a discriminant analysis machine learning model - code example |
see also: what is linear regression?, nonlinear regression, support vector machine (svm), convolutional neural network, long short-term memory (lstm) networks, supervised learning