compute shapley values for query point -凯发k8网页登录
compute shapley values for query point
since r2021a
description
computes the shapley values for the specified query point (newexplainer
= fit(explainer
,querypoint
)querypoint
)
and stores the computed shapley values in the shapleyvalues
property of newexplainer
. the shapley
object
explainer
contains a machine learning model and the options for
computing shapley values.
fit
uses the shapley value computation options that you specify
when you create explainer
. you can change the options using the
name-value arguments of the fit
function. the function returns a
shapley
object newexplainer
that contains the newly
computed shapley values.
specifies additional options using one or more name-value arguments. for example, specify
newexplainer
= fit(explainer
,querypoint
,name,value
)'useparallel',true
to compute shapley values in parallel.
examples
create shapley
object and compute shapley values using fit
train a regression model and create a shapley
object. when you create a shapley
object, if you do not specify a query point, then the software does not compute shapley values. use the object function fit
to compute the shapley values for the specified query point. then create a bar graph of the shapley values by using the object function plot
.
load the carbig
data set, which contains measurements of cars made in the 1970s and early 1980s.
load carbig
create a table containing the predictor variables acceleration
, cylinders
, and so on, as well as the response variable mpg
.
tbl = table(acceleration,cylinders,displacement,horsepower,model_year,weight,mpg);
removing missing values in a training set can help reduce memory consumption and speed up training for the fitrkernel
function. remove missing values in tbl
.
tbl = rmmissing(tbl);
train a blackbox model of mpg
by using the function
rng('default') % for reproducibility mdl = fitrkernel(tbl,'mpg','categoricalpredictors',[2 5]);
create a shapley
object. specify the data set tbl
, because mdl
does not contain training data.
explainer = shapley(mdl,tbl)
explainer = shapley with properties: blackboxmodel: [1x1 regressionkernel] querypoint: [] blackboxfitted: [] shapleyvalues: [] numsubsets: 64 x: [392x7 table] categoricalpredictors: [2 5] method: 'interventional-kernel' intercept: 22.6202
explainer
stores the training data tbl
in the x
property.
compute the shapley values of all predictor variables for the first observation in tbl
.
querypoint = tbl(1,:)
querypoint=1×7 table
acceleration cylinders displacement horsepower model_year weight mpg
____________ _________ ____________ __________ __________ ______ ___
12 8 307 130 70 3504 18
explainer = fit(explainer,querypoint);
for a regression model, shapley
computes shapley values using the predicted response, and stores them in the shapleyvalues
property. display the values in the shapleyvalues
property.
explainer.shapleyvalues
ans=6×2 table
predictor shapleyvalue
______________ ____________
"acceleration" -0.1561
"cylinders" -0.18306
"displacement" -0.34203
"horsepower" -0.27291
"model_year" -0.2926
"weight" -0.32402
plot the shapley values for the query point by using the plot
function.
plot(explainer)
the horizontal bar graph shows the shapley values for all variables, sorted by their absolute values. each shapley value explains the deviation of the prediction for the query point from the average, due to the corresponding variable.
compute shapley values for multiple query points
train a classification model and create a shapley
object. then compute the shapley values for multiple query points.
load the creditrating_historical
data set. the data set contains customer ids and their financial ratios, industry labels, and credit ratings.
tbl = readtable('creditrating_historical.dat');
train a blackbox model of credit ratings by using the fitcecoc
function. use the variables from the second through seventh columns in tbl
as the predictor variables.
blackbox = fitcecoc(tbl,'rating', ... 'predictornames',tbl.properties.variablenames(2:7), ... 'categoricalpredictors','industry');
create a shapley
object with the blackbox
model. for faster computation, subsample 25% of the observations from tbl
with stratification and use the samples to compute the shapley values. specify to use the extension to the kernel shap algorithm.
rng('default') % for reproducibility c = cvpartition(tbl.rating,'holdout',0.25); tbl_s = tbl(test(c),:); explainer = shapley(blackbox,tbl_s,'method','conditional');
find two query points whose true rating values are aaa
and b
, respectively.
querypoint(1,:) = tbl_s(find(strcmp(tbl_s.rating,'aaa'),1),:); querypoint(2,:) = tbl_s(find(strcmp(tbl_s.rating,'b'),1),:)
querypoint=2×8 table
id wc_ta re_ta ebit_ta mve_bvtd s_ta industry rating
_____ ______ ______ _______ ________ _____ ________ _______
58258 0.511 0.869 0.106 8.538 0.732 2 {'aaa'}
82367 -0.078 -0.042 0.011 0.262 0.167 7 {'b' }
compute and plot the shapley values for the first query point.
explainer1 = fit(explainer,querypoint(1,:)); plot(explainer1)
compute and plot the shapley values for the second query point.
explainer2 = fit(explainer,querypoint(2,:)); plot(explainer2)
the true rating for the second query point is b
, but the predicted rating is bb
. the plot shows the shapley values for the predicted rating.
explainer1
and explainer2
include the shapley values for the first query point and second query point, respectively.
input arguments
explainer
— object explaining blackbox model
shapley
object
object explaining the blackbox model, specified as a shapley
object.
querypoint
— query point
row vector of numeric values | single-row table
query point at which fit
explains a prediction,
specified as a row vector of numeric values or a single-row table.
for a row vector of numeric values:
for a single-row table:
if the predictor data
explainer.x
is a table, then all predictor variables inquerypoint
must have the same variable names and data types as those inexplainer.x
. however, the column order ofquerypoint
does not need to correspond to the column order ofexplainer.x
.if the predictor data
explainer.x
is a numeric matrix, then the predictor names inexplainer.blackboxmodel.predictornames
and the corresponding predictor variable names inquerypoint
must be the same. to specify predictor names during training, use the'predictornames'
name-value argument. all predictor variables inquerypoint
must be numeric vectors.querypoint
can contain additional variables (response variables, observation weights, and so on), butfit
ignores them.fit
does not support multicolumn variables or cell arrays other than cell arrays of character vectors.
if querypoint
contains nan
s for continuous
predictors and 'method'
is 'conditional'
, then
the shapley values (shapleyvalues
) in the returned object are nan
s.
otherwise, fit
handles nan
values in the
same way as explainer.blackboxmodel
(the predict
object function of explainer.blackboxmodel
or a function handle
specified by blackbox
).
example: explainer.x(1,:)
specifies the query point as the first
observation of the predictor data x
in
explainer
.
data types: single
| double
| table
name-value arguments
specify optional pairs of arguments as
name1=value1,...,namen=valuen
, where name
is
the argument name and value
is the corresponding value.
name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
before r2021a, use commas to separate each name and value, and enclose
name
in quotes.
example: fit(explainer,q,'method','conditional','useparallel',true)
computes the shapley values for the query point q
using the extension to
the kernel shap algorithm, and executes the computation in parallel.
maxnumsubsets
— maximum number of predictor subsets
explainer.numsubsets
(default) | positive integer
maximum number of predictor subsets to use for shapley value computation, specified as a positive integer.
for details on how fit
chooses the subsets to use,
see computational cost.
this argument is valid when the fit
function uses the kernel
shap algorithm or the extension to the kernel shap algorithm. if you set the
maxnumsubsets
argument when method
is
'interventional'
, the software uses the kernel shap algorithm.
for more information, see algorithms.
example: 'maxnumsubsets',100
data types: single
| double
method
— shapley value computation algorithm
'interventional'
| 'conditional'
since r2023a
shapley value computation algorithm, specified as
'interventional'
or 'conditional'
.
'interventional'
—fit
computes the shapley values with an interventional value function.fit
offers three interventional algorithms: kernel shap [1], linear shap [1], and tree shap [2]. the software selects an algorithm based on the machine learning model
and other specified options. for details, see interventional algorithms.explainer
.blackboxmodel'conditional'
—fit
uses the extension to the kernel shap algorithm [3] with a conditional value function.
the method
property of newexplainer
stores the name of the selected
algorithm. for more information, see algorithms.
by default, the fit
function uses the algorithm specified in
the method
property of explainer
.
before r2023a: you can specify this argument as
'interventional-kernel'
or
'conditional-kernel'
. fit
supports
the kernel shap algorithm and the extension of the kernel shap algorithm.
example: 'method','conditional'
data types: char
| string
useparallel
— flag to run in parallel
false
(default) | true
flag to run in parallel, specified as true
or false
. if you specify "useparallel",true
, the
fit
function executes for
-loop iterations by
using . the loop runs in parallel when you
have parallel computing toolbox™.
this argument is valid when the fit
function uses the tree
shap algorithm for an ensemble of trees, the kernel shap algorithm, or the extension
to the kernel shap algorithm.
example: 'useparallel',true
data types: logical
output arguments
newexplainer
— object explaining blackbox model
shapley
object
object explaining the blackbox model, returned as a shapley
object.
the shapleyvalues
property of the object contains the computed shapley values.
to overwrite the input argument explainer
, assign the output of
fit
to
explainer
:
explainer = fit(explainer,querypoint);
more about
shapley values
in game theory, the shapley value of a player is the average marginal contribution of the player in a cooperative game. in the context of machine learning prediction, the shapley value of a feature for a query point explains the contribution of the feature to a prediction (response for regression or score of each class for classification) at the specified query point.
the shapley value of a feature for a query point is the contribution of the feature to the deviation from the average prediction. for a query point, the sum of the shapley values for all features corresponds to the total deviation of the prediction from the average. that is, the sum of the average prediction and the shapley values for all features corresponds to the prediction for the query point.
for more details, see shapley values for machine learning model.
references
[1] lundberg, scott m., and s. lee. "a unified approach to interpreting model predictions." advances in neural information processing systems 30 (2017): 4765–774.
[2] lundberg, scott m., g. erion, h. chen, et al. "from local explanations to global understanding with explainable ai for trees." nature machine intelligence 2 (january 2020): 56–67.
[3] aas, kjersti, martin jullum, and anders løland. "explaining individual predictions when features are dependent: more accurate approximations to shapley values." artificial intelligence 298 (september 2021).
extended capabilities
automatic parallel support
accelerate code by automatically running computation in parallel using parallel computing toolbox™.
to run in parallel, set the useparallel
name-value argument to
true
in the call to this function.
for more general information about parallel computing, see (parallel computing toolbox).
version history
introduced in r2021ar2023a: fit
supports the linear shap and tree shap algorithms
fit
supports the linear shap [1] algorithm for linear models and the tree
shap [2] algorithm for tree models and ensemble
models of tree learners.
if you specify the method
name-value argument as
'interventional'
, the fit
function selects an
algorithm based on the machine learning model type of explainer
. the
method
property
of newexplainer
stores the name of the selected algorithm.
r2023a: values of the method
name-value argument have changed
the supported values of the method
name-value argument have changed
from 'interventional-kernel'
and 'conditional-kernel'
to 'interventional'
and 'conditional'
,
respectively.
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
select a web site
choose a web site to get translated content where available and see local events and offers. based on your location, we recommend that you select: .
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.
americas
- (español)
- (english)
- (english)
europe
- (english)
- (english)
- (deutsch)
- (español)
- (english)
- (français)
- (english)
- (italiano)
- (english)
- (english)
- (english)
- (deutsch)
- (english)
- (english)
- switzerland
- (english)
asia pacific
- (english)
- (english)
- (english)
- 中国
- (日本語)
- (한국어)