main content

compute confidence intervals for estimated parameters (requires statistics and machine learning toolbox) -凯发k8网页登录

compute confidence intervals for estimated parameters (requires statistics and machine learning toolbox)

description

example

ci = sbioparameterci(fitresults) computes 95% confidence intervals for the estimated parameters from fitresults, an or returned by the function. ci is a object that contains the computed confidence intervals.

example

ci = sbioparameterci(fitresults,name,value) uses additional options specified by one or more name,value pair arguments.

examples

load data

load the sample data to fit. the data is stored as a table with variables id , time , centralconc , and peripheralconc. this synthetic data represents the time course of plasma concentrations measured at eight different time points for both central and peripheral compartments after an infusion dose for three individuals.

load data10_32r.mat
gdata = groupeddata(data);
gdata.properties.variableunits = {'','hour','milligram/liter','milligram/liter'};
sbiotrellis(gdata,'id','time',{'centralconc','peripheralconc'},'marker',' ',...
            'linestyle','none');

create model

create a two-compartment model.

pkmd                 = pkmodeldesign;
pkc1                 = addcompartment(pkmd,'central');
pkc1.dosingtype      = 'infusion';
pkc1.eliminationtype = 'linear-clearance';
pkc1.hasresponsevariable = true;
pkc2                 = addcompartment(pkmd,'peripheral');
model                = construct(pkmd);
configset            = getconfigset(model);
configset.compileoptions.unitconversion = true;

define dosing

define the infusion dose.

dose             = sbiodose('dose','targetname','drug_central');
dose.starttime   = 0;
dose.amount      = 100;
dose.rate        = 50;
dose.amountunits = 'milligram';
dose.timeunits   = 'hour';
dose.rateunits   = 'milligram/hour';

define parameters

define the parameters to estimate. set the parameter bounds for each parameter. in addition to these explicit bounds, the parameter transformations (such as log, logit, or probit) impose implicit bounds.

responsemap = {'drug_central = centralconc','drug_peripheral = peripheralconc'};
paramstoestimate   = {'log(central)','log(peripheral)','q12','cl_central'};
estimatedparam     = estimatedinfo(paramstoestimate,...
                                   'initialvalue',[1 1 1 1],...
                                   'bounds',[0.1 3;0.1 10;0 10;0.1 2]);

fit model

perform an unpooled fit, that is, one set of estimated parameters for each patient.

unpooledfit = sbiofit(model,gdata,responsemap,estimatedparam,dose,'pooled',false);

perform a pooled fit, that is, one set of estimated parameters for all patients.

pooledfit = sbiofit(model,gdata,responsemap,estimatedparam,dose,'pooled',true);

compute confidence intervals for estimated parameters

compute 95% confidence intervals for each estimated parameter in the unpooled fit.

ciparamunpooled = sbioparameterci(unpooledfit);

display results

display the confidence intervals in a table format. for details about the meaning of each estimation status, see .

ci2table(ciparamunpooled)
ans =
  12x7 table
    group         name         estimate    confidenceinterval      type      alpha      status   
    _____    ______________    ________    __________________    ________    _____    ___________
      1      {'central'   }      1.422      1.1533     1.6906    gaussian    0.05     estimable  
      1      {'peripheral'}     1.5629     0.83143     2.3551    gaussian    0.05     constrained
      1      {'q12'       }    0.47159     0.20093    0.80247    gaussian    0.05     constrained
      1      {'cl_central'}    0.52898     0.44842    0.60955    gaussian    0.05     estimable  
      2      {'central'   }     1.8322      1.7893     1.8751    gaussian    0.05     success    
      2      {'peripheral'}     5.3368      3.9133     6.7602    gaussian    0.05     success    
      2      {'q12'       }    0.27641      0.2093    0.34351    gaussian    0.05     success    
      2      {'cl_central'}    0.86034     0.80313    0.91755    gaussian    0.05     success    
      3      {'central'   }     1.6657      1.5818     1.7497    gaussian    0.05     success    
      3      {'peripheral'}     5.5632      4.7557     6.3708    gaussian    0.05     success    
      3      {'q12'       }    0.78361     0.65581    0.91142    gaussian    0.05     success    
      3      {'cl_central'}     1.0233     0.96375     1.0828    gaussian    0.05     success    

plot the confidence intervals. if the estimation status of a confidence interval is success, it is plotted in blue (the first default color). otherwise, it is plotted in red (the second default color), which indicates that further investigation into the fitted parameters may be required. if the confidence interval is not estimable, then the function plots a red line with a centered cross. if there are any transformed parameters with estimated values 0 (for the log transform) and 1 or 0 (for the probit or logit transform), then no confidence intervals are plotted for those parameter estimates. to see the color order, type get(groot,'defaultaxescolororder').

groups are displayed from left to right in the same order that they appear in the groupnames property of the object, which is used to label the x-axis. the y-labels are the transformed parameter names.

plot(ciparamunpooled)

compute the confidence intervals for the pooled fit.

ciparampooled = sbioparameterci(pooledfit);

display the confidence intervals.

ci2table(ciparampooled)
ans =
  4x7 table
    group          name         estimate    confidenceinterval      type      alpha      status   
    ______    ______________    ________    __________________    ________    _____    ___________
    pooled    {'central'   }     1.6626      1.3287     1.9965    gaussian    0.05     estimable  
    pooled    {'peripheral'}      2.687     0.89848     4.8323    gaussian    0.05     constrained
    pooled    {'q12'       }    0.44956     0.11445    0.85152    gaussian    0.05     constrained
    pooled    {'cl_central'}    0.78493     0.59222    0.97764    gaussian    0.05     estimable  

plot the confidence intervals. the group name is labeled as "pooled" to indicate such fit.

plot(ciparampooled)

plot all the confidence interval results together. by default, the confidence interval for each parameter estimate is plotted on a separate axes. vertical lines group confidence intervals of parameter estimates that were computed in a common fit.

ciall = [ciparamunpooled;ciparampooled];
plot(ciall)

you can also plot all confidence intervals in one axes grouped by parameter estimates using the 'grouped' layout.

plot(ciall,'layout','grouped')

in this layout, you can point to the center marker of each confidence interval to see the group name. each estimated parameter is separated by a vertical black line. vertical dotted lines group confidence intervals of parameter estimates that were computed in a common fit. parameter bounds defined in the original fit are marked by square brackets. note the different scales on the y-axis due to parameter transformations. for instance, the y-axis of q12 is in the linear scale, but that of central is in the log scale due to its log transform.

compute confidence intervals for model predictions

calculate 95% confidence intervals for the model predictions, that is, simulation results using the estimated parameters.

% for the pooled fit
cipredpooled = sbiopredictionci(pooledfit);
% for the unpooled fit
cipredunpooled = sbiopredictionci(unpooledfit);

plot confidence intervals for model predictions

the confidence interval for each group is plotted in a separate column, and each response is plotted in a separate row. confidence intervals limited by the bounds are plotted in red. confidence intervals not limited by the bounds are plotted in blue.

plot(cipredpooled)

plot(cipredunpooled)

input arguments

parameter estimation results from , specified as an , , or a vector of objects for unpooled fits that were returned from the same sbiofit call.

name-value arguments

specify optional pairs of arguments as name1=value1,...,namen=valuen, where name is the argument name and value is the corresponding value. name-value arguments must appear after other arguments, but the order of the pairs does not matter.

before r2021a, use commas to separate each name and value, and enclose name in quotes.

example: 'alpha',0.01,'type','profilelikelihood' specifies to compute a 99% confidence interval using the profile likelihood approach.

depending on the type of confidence interval, the compatible name-value arguments differ. the table below lists all the name-value arguments and their corresponding confidence interval types. a check mark (✔) indicates that the name-value argument is applicable for that type.

name-value argumentgaussian (default)optimization-based profile likelihoodintegration-based profile likelihoodbootstrap
alpha
type
display
useparallel
numsamples   
tolerance 
parameters  
maxstepsize  
useintegration  
integrationoptions   

confidence level, (1-alpha) * 100%, specified as the comma-separated pair consisting of 'alpha' and a positive scalar between 0 and 1. the default value is 0.05, meaning a 95% confidence interval is computed.

example: 'alpha',0.01

confidence interval type, specified as the comma-separated pair consisting of 'type' and a character vector. the valid choices are:

  • 'gaussian' — use the gaussian approximation of the distribution of parameter estimates.

  • 'profilelikelihood' — compute the profile likelihood intervals. the function has two methods to compute profile likelihood curves. by default, the function uses the optimization-based method. to use the integration-based method, you must also set 'useintegration' to true.

    the optimization-based method fixes one parameter value at a time and reruns an optimization to compute the maximum likelihood. this optimization is done for every parameter and every point on the curve of the profile likelihood. the integration-based method is based on integrating the differential equations derived from the lagrange equations of the optimization-based method. for details about these two methods, see profile likelihood confidence interval calculation.

    note

    this type is not supported for parameter estimates from hierarchical models, that is, estimated results from (such as age or sex). in other words, if you set the categoryvariablename property of the in your original fit, then the fit results are hierarchical and you cannot compute the profilelikelihood confidence intervals on the results.

  • 'bootstrap' — compute confidence intervals using the bootstrap method.

example: 'type','bootstrap'

level of display returned to the command line, specified as the comma-separated pair consisting of 'display' and a character vector. 'off' (default) or 'none' displays no output. 'final' displays a message when a computation finishes. 'iter' displays output at each iteration.

example: 'display','final'

logical flag to compute confidence intervals in parallel, specified as the comma-separated pair consisting of 'useparallel' and true or false. by default, the parallel options in the original fit are used. if this argument is set to true and parallel computing toolbox™ is available, the parallel options in the original fit are ignored, and confidence intervals are computed in parallel.

for the gaussian confidence intervals:

  • if the input fitresults is a vector of results objects, then the computation of confidence intervals for each object is performed in parallel. the gaussian confidence intervals are quick to compute. so, it might be more beneficial to parallelize the original fit (sbiofit) and not set useparallel to true for sbioparameterci.

for the profile likelihood confidence intervals:

  • if the number of results objects in the input fitresults vector is greater than the number of estimated parameters, then the computation of confidence intervals for each object is performed in parallel.

  • otherwise, the confidence intervals for all estimated parameters within one results object are computed in parallel before the function moves on to the next results object.

for the bootstrap confidence intervals:

  • the function forwards the useparallel flag to bootci. there is no parallelization over the input vector of results objects.

note

if you have a global stream for random number generation with several substreams to compute in parallel in a reproducible fashion, sbioparameterci first checks to see if the number of workers is same as the number of substreams. if so, sbioparameterci sets usesubstreams to true in the statset option and passes it to (statistics and machine learning toolbox). otherwise, the substreams are ignored by default.

example: 'useparallel',true

number of samples for bootstrapping, specified as the comma-separated pair consisting of 'numsamples' and a positive integer. this number defines the number of fits that are performed during the confidence interval computation to generate bootstrap samples. the smaller the number is, the faster the computation of the confidence intervals becomes, at the cost of decreased accuracy.

example: 'numsamples',500

tolerance for the profile likelihood and bootstrap confidence interval computations, specified as the comma-separated pair consisting of 'tolerance' and a positive scalar.

the profile likelihood method uses this value as a termination tolerance. for details, see profile likelihood confidence interval calculation.

the bootstrap method uses this value to determine whether a confidence interval is constrained by bounds specified in the original fit. for details, see bootstrap confidence interval calculation.

example: 'tolerance',1e-6

names of parameters for which the profile likelihood curves are calculated, specified as a character vector, string, string vector, or cell array of character vectors. by default, the function computes the confidence intervals for all parameters listed in the property of the fitresults object. you can also specify a subset of those parameters if needed.

note

this name-value argument is applicable only when you specify type as 'profilelikelihood'.

example: 'parameters',{'ka'}

maximum step size used for computing profile likelihood curves, specified as the comma-separated pair consisting of 'maxstepsize' and a positive scalar, [], or cell array.

  • for the optimization-based method, the default value is 0.1. if you set 'maxstepsize' to [], then the maximum step size is set to 10% of the width of the gaussian approximation of the confidence interval, if it exists. you can specify a maximum step size (or []) for each estimated parameter using a cell array.

  • for the integration-based method, the default value is inf. internally, the function uses the ode15s solver.

example: 'maxstepsize',0.5

flag to use the integration-based profile likelihood confidence interval method, specified as true or false. the integration-based method integrates differential equations derived from the lagrange equations. by default, the function uses the optimization-based method. for details about these two methods, see profile likelihood confidence interval calculation.

example: 'useintegration',true

options for the integration-based profile likelihood confidence interval method, specified as a structure. specify options as fields of the structure as follows.

field namefield value description
hessian

'finitedifference' — use the finite difference approximation of the hessian matrix. this is the default value.

'identity' — use the identity matrix as the hessian matrix approximation. you must also specify a positive correctionfactor value.

correctionfactornonnegative scalar. the default value is 0.
absolutetolerancepositive scalar for the step size control in ode15s. the default value is 1e-2.
relativetolerancepositive scalar less than 1 for the step size control in ode15s. the default value is 1e-2.
initialstepsizepositive scalar as the initial step size for solving the differential equations. if a parameter is bounded, the function uses the default initial step size of . if not, it uses 1e-4.

output arguments

confidence interval results, returned as a parameterconfidenceinterval object. for an unpooled fit, ci can be a vector of parameterconfidenceinterval objects.

more about

gaussian confidence interval calculation

the function uses the wald test statistic [1] to compute the confidence intervals. assuming that there are enough data, the parameter estimates, pest, are approximately student's t-distributed with the covariance matrix s (the covariancematrix property of the results object) returned by sbiofit.

the confidence interval for the ith parameter estimate pest,i is computed as follows:

pest,i±si,i*tinv(1alpha2), where tinv is the student's t inverse cumulative distribution function ( (statistics and machine learning toolbox)) with the probability 1-(alpha/2), and si,i is the diagonal element (variance) of the covariance matrix s.

in cases where the confidence interval is constrained by the parameter bounds defined in the original fit, the confidence interval bounds are adjusted according to the approach described by wu, h. and neale, m. [2].

setting estimation status
  • for each parameter estimate, the function first decides whether the confidence interval of the parameter estimate is unbounded. if so, the function sets the estimation status of the corresponding parameter estimate to not estimable.

  • otherwise, if the confidence interval for a parameter estimate is constrained by a parameter bound defined in the original fit, the function sets the estimation status to constrained. parameter transformations (such as log, probit, or logit) impose implicit bounds on the estimated parameters, for example, positivity constraints. such bounds can lead to the overestimation of confidence, that is, the confidence interval can be smaller than expected.

  • if no confidence interval has the estimation status not estimable or constrained, then the function sets the estimation statuses of all parameter estimates to success. otherwise, the estimation statuses of remaining parameter estimates are set to estimable.

profile likelihood confidence interval calculation

define l to be the likelihood, lh, of the parameter estimates (stored in the parameterestimates property of the results object) returned by sbiofit, l=lh(pest), where pest is a vector of parameter estimates, pest,1, pest,2, …, pest,n.

the profile likelihood function pl for a parameter pi is defined as pl(pi)=maxpj,jilh(p1,...,pi,..,pn), where n is the total number of parameters.

per wilks's theorem [3], the likelihood ratio test statistic, 2log(pl(pi)l), is chi-square distributed with 1 degree of freedom.

therefore, find all pi so that: log(l)log(pl(pi))chiinv(1,1alpha)2.

equivalently, log(pl(pi))log(l)chiinv(1,1alpha)2, where log(l)chiinv(1,1alpha)2 is the target value used in computing the log profile likelihood curve. the function provides two methods to compute such curve.

optimization-based method to compute log profile likelihood curve
  1. start at pest,i and evaluate the likelihood l.

  2. compute the log profile likelihood at pest,i k * maxstepsize for each side (or direction) of the confidence interval, that is, k = 1, 2, 3,… and k = -1, -2, -3,….

  3. stop if one of these stopping criteria is met on each side.

    • the log profile likelihood falls below the target value. in this case, start bisecting between pbelow and pabove, where pbelow is the parameter value with the largest log profile likelihood value below the target value, and pabove the parameter value with the smallest log profile likelihood value greater than the target value. stop the bisection if one of the following is true:

      • either neighboring log profile likelihood values are less than tolerance apart. set the status for the corresponding side of the confidence interval to success.

      • the bisection interval becomes smaller than max(tolerance,2*eps('double')) and the profile likelihood curve computed so far is above the target value. set the status of the corresponding side to not estimable.

      • the linear gradient approximation of the profile likelihood curve (finite difference between two neighboring parameter values) is larger than -tolerance (the negative value of the tolerance). set the status of the corresponding side to not estimable.

    • the step is limited by a bound defined in the original fit. evaluate at the bound and set the status of the corresponding side to constrained.

integration-based method to compute log profile likelihood curve

this method [4] solves the constrained optimization problem pl(pi)=maxpj,jilh(p1,...,pi,..,pn) by integrating the differential equations derived from the lagrange equations

pl(p(c)) λ(c)ei=0p(c)i=c

here, ei is the ith canonical unit vector, the lagrange multiplier is λ(c), and c = pi.

in other words, instead of optimizing point by point, this method solves differential equations that define the profile likelihood curve as follows.

(p2l(p(c))ei±eit0)(p˙(c)λ˙(c))=(01)

here, p˙(c)=p(c)c,λ˙(c)=λ(c)c, and p2l(p(c)) is the hessian of the log likelihood function.

using the finite-difference approximation of the hessian matrix is recommended. however, the numerical computation of the hessian matrix using finite differencing can be computationally expensive. to reduce the computational costs, chen and jennrich [4] proposed an approximate version based on the assumption that the second-order sufficient karush-kuhn-tucker conditions must hold with strict inequality at every point in the domain of the profile likelihood curve as outlined in assumption 2 in the appendix of [4]. in other words, at every point on the profile likelihood curve, the remaining parameters must be estimable.

if this assumption holds, then the hessian can be replaced with the identity matrix i as follows:

(iei±eit0)(p˙(c)λ˙(c))=(γpl(p(c))1)

here, pl(p(c)) is the gradient of the log likelihood and γ is a correction factor to ensure the solution of the differential equation stays on the path of the profile likelihood curve.

if γ is too small, the approximation of the profile likelihood curve may become inaccurate, resulting in an underestimation of the profile likelihood confidence intervals. setting γ to a large value ensures accurate results, but might require to take smaller steps, which increases the computational cost.

tip

you can specify the hessian approximation and correction factor using the integrationoptions name-value argument.

the stopping criterion of the algorithm is when one of the following conditions becomes true:

  • the gradient approximation of the profile likelihood curve is larger than -tolerance.

  • the profile likelihood falls below the target value.

  • a parameter bound is reached.

setting estimation status
  • if both sides of the confidence interval are unsuccessful, that is, have the status not estimable, the function sets the estimation status () to not estimable.

  • if no side has the status not estimable and one side has the status constrained, the function sets the estimation status () to constrained.

  • if the computation for all parameters on both sides of the confidence intervals is successful, set the estimation status () to success.

  • otherwise, the function sets the estimation statuses of the remaining parameter estimates to estimable.

bootstrap confidence interval calculation

the (statistics and machine learning toolbox) function from statistics and machine learning toolbox™ is used to compute the bootstrap confidence intervals. the first input nboot is the number of samples (numsamples), and the second input bootfun is a function that performs these actions:

  • resample the data (independently within each group, if multiple groups are available).

  • run a parameter fit with the resampled data.

  • return the estimated parameters.

setting estimation status

if a confidence interval is closer than tolerance to a parameter bound, as defined in the original fit, the function sets the estimation status to constrained. if all confidence intervals are further away from the parameter bounds than tolerance, the function sets the status to success. otherwise, it is set to estimable.

references

[1] wald, a. "tests of statistical hypotheses concerning several parameters when the number of observations is large." transactions of the american mathematical society. 54 (3), 1943, pp. 426-482.

[2] wu, h., and m.c. neale. "adjusted confidence intervals for a bounded parameter." behavior genetics. 42 (6), 2012, pp. 886-898.

[3] wilks, s.s. "the large-sample distribution of the likelihood ratio for testing composite hypotheses." the annals of mathematical statistics. 9 (1), 1938, pp. 60–62.

[4] chen, jian-shen, and robert i. jennrich. “simple accurate approximation of likelihood profiles.” journal of computational and graphical statistics 11, no. 3 (september 2002): 714–32.

extended capabilities

version history

introduced in r2017b

see also

| | |

网站地图