Emerging methods for multi-model inference in ecology: inference based on parameter inclusion probabilities

A procedure for model selection has been recently showing up in the ecological literature that seems to have followed the increasing use of Bayesian hierarchical models. I was first introduced to the procedure by the book, “Hierarchical modeling and inference in ecology” (Royle and Dorazio 2008) a few years back and have seen the procedure used for multi-species occupancy models (e.g. Burton et al. 2012) and N-mixture models (e.g. Graves et al. 2012) in the peer reviewed literature, and have used it myself in Coggins et al. (2014). The procedure shifts the perspective of model selection from one that aims to determine the “best” model by using a criterion such as AIC or BIC to one that aims to determine the support of each parameter evaluated in the full model by modeling the probability of parameter inclusion directly with a mixture-modeling approach.  In a Bayesian context, this is easily achieved by adding additional parameters to the model that essentially flag the inclusion or exclusion of each effect parameter.  For example, consider the simple four-parameter linear regression model:

formula1

where y is the dependent variable, x1-3 are the independent variables, β1 is the intercept and β2-4 are effect parameters for the independent variables.  The information theoretic approach (Burnham and Anderson 2002) of determining if the effects of x1-3 on y are supported by the data is to compare competing models with AIC (or other criterion) both including and excluding parameters β2-4. The conclusion is then drawn that the parameters included in the most supported model are supported by the data. Alternatively, the Bayesian mixture model approach is to include three additional parameters in the linear model as:

formula2

where the w parameters are Bernoulli outcomes with prior probabilities of 0.5 (i.e. no prior information on the inclusion of exclusion of parameters). For each Markov chain Monte Carlo (MCMC) iteration, each w parameter can either have the value of one or zero. When the value of w is one, its associated β parameter is included in the model. When the value of w is zero, its associated β parameter is excluded from the model.  Thus, the mean of the posterior samples of the w parameters is the probability of parameter inclusion in the model and a measure of support for that parameter. Furthermore, you can easily calculate the model probabilities directly from the posterior samples of w. For example, the proportion of MCMC iterations that resulted in w1-3 equal to the vector c(1,1,1) is the probability of the full model. The proportion of MCMC iterations that result in the w1-3 vector of c(0,1,0) is the probability of the model that only includes the intercept (β1) and the β3 parameter. Thus, inference can be drawn from either the parameter inclusion probabilities and/or the model probabilities.

The advantage of basing inference on the parameter inclusion probabilities is made most apparent for highly parameterized models with high model uncertainty.  Consider a linear model with 20 parameters in addition to the intercept. You have determined, a priori, that all possible combinations of these parameters are plausible models.  Thus, you aim to compare all possible models with AIC. Firstly, this could be a difficult task to accomplish because 20 parameters equates to 1,048,576 (i.e. 220) possible models! Unless you are a pretty clever coder, calculating AIC for each model may not be possible.  Secondly, if there is uncertainty in the inclusion of many of the parameters in the best model, your AIC table of the top 10 or even 100 models may have model probabilities that are nearly indistinguishable.  It becomes difficult to draw clean inferences in these situations.  Alternatively, evaluating a 20-parameter model with the Bayesian mixture-modeling approach is quite simple because the focus can be transferred from 220 model probabilities to merely 20 parameter inclusion probabilities.  This is a huge advantage alone, but the real advantage is the clarity of inference even in the face of high model uncertainty.  Inference is simply drawn by looking to the parameters with high inclusion probability.

table1For example, to the right are two tables from a recent preliminary analysis aimed to determine the effectiveness of fisheries stock enhancement in German lakes. The full model contained 17 possible covariates which equates to 131,072 plausible models. Table 2 (top table) shows the model probabilities of the top 15 models.  Notice how there is little contrast in the probability of the top 15 models. To the contrary, the 15 models all have very low and similar probabilities.  This is due to high model uncertainty association with uncertainty in the inclusion of many of the parameters. Drawing clear inference from this table is difficult and may be impossible.

table2Alternatively, table 3 (bottom table) reports the inclusion probabilities of each parameter. It is simple to draw clear inference from this table by looking to the parameters with the highest probability.  These are the parameters that are most supported by the data. Furthermore, it is easy to identify which parameters are driving the high model uncertainty. The most uncertain parameters are those that have values closest to 0.5. Parameter inclusion probabilities of zero or one indicate perfect certainty that the model should exclude or include the parameter, respectively.

The following is an example of the model coded in JAGS (Plummer 2003) and run with program R (R Development Core Team 2010) using the package rjags (http://mcmc-jags.sourceforge.net). The code first simulates data and then estimates the parameter inclusion probabilities using the mixture-modeling approach. To run this code you will need to install R, JAGS, and install the R package, rjags.

Hope you enjoy!

#begin code
library(rjags)
library(car)
################################################################################
# this part simulates data
################################################################################
n=40 #sample size
std = 1 #residual error
int = 1 #intercept
beta1 = .4 #four effect parameters
beta2 = 1
beta3 = .3
beta4 = 0

x1 =rnorm(n,0,1) #simulates four covariates
x2 =rnorm(n,0,1)
x3 =rnorm(n,0,1)
x4 =rnorm(n,0,1)

yobs = int + beta1*x1 + beta2*x2 + beta3*x3 + beta4*x4 + rnorm(n,0,std)
###############################################################################
# Bayesian approach
###############################################################################
modelFilename = "modselect"
 cat("
model {
 #Priors for inclusion parameters
 for(i in 1:4){w[i]~dbern(.5)}

 #Priors of other parameters
 tau <- 1/(sig*sig)
 sig~dgamma(.01,.01)
 int ~ dnorm(0,.001)
 for(i in 1:4){bet[i] ~ dnorm(0,.001)}

 #The model
 for(i in 1:n){
 ypred[i] <- int + w[1]*bet[1]*x1[i] + w[2]*bet[2]*x2[i] + w[3]*bet[3]*x3[i] + 
 w[4]*bet[4]*x4[i]
 yobs[i] ~ dnorm(ypred[i],tau)
 }
 }
", fill=TRUE, file=modelFilename)
#file.show(modelFilename)

jags.data = list(yobs = yobs, x1=x1, x2=x2, x3=x3, x4=x4, n=length(yobs))
jags.parms = c('bet','w')
 jmod = jags.model(file=modelFilename, data=jags.data, n.chains=3,
 n.adapt=1000) #inits = jags.inits(),
 update(jmod, n.iter=1000, by=10, progress.bar='text')
 post = coda.samples(jmod, jags.parms, n.iter=10000, thin=10)

 mypost = as.matrix(post, chain=F)
 head(mypost,1)
 bet=mypost[,1:4]
 w = mypost[,5:8]
#------------------------------------------------------------------------------#
# model probs
#------------------------------------------------------------------------------#
models = as.matrix(table(paste(w[,1],w[,2],w[,3],w[,4], sep="")))/dim(w)[[1]]
#This identifies the top models and their probabilities
modprobs = as.matrix(as.matrix(models[order(models[,1],decreasing = T),]
 )[1:min(5,dim(models)[[1]]),])
#------------------------------------------------------------------------------#
# Summary statistics with parameter inclusion probabilities
# Parameters summaries are model averaged
#------------------------------------------------------------------------------#
parmas = bet*w
t(rbind(mean=colMeans(parmas),
sd=apply(parmas,2,sd),
apply(parmas,2,quantile,c(0.025,0.975),na.rm=T),
inclusion_probs=colMeans(w)))

modprobs
#end code
Advertisements

About dangwinn

I am a quantitative ecologist interested in patterns in biodiversity and natural resource management.
This entry was posted in Ecological statistics, Ecology, Uncategorized and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s