A procedure for model selection has been recently showing up in the ecological literature that seems to have followed the increasing use of Bayesian hierarchical models. I was first introduced to the procedure by the book, “Hierarchical modeling and inference in ecology” (Royle and Dorazio 2008) a few years back and have seen the procedure used for multi-species occupancy models (e.g. Burton et al. 2012) and N-mixture models (e.g. Graves et al. 2012) in the peer reviewed literature, and have used it myself in Coggins et al. (2014). The procedure shifts the perspective of model selection from one that aims to determine the “best” model by using a criterion such as AIC or BIC to one that aims to determine the support of each parameter evaluated in the full model by modeling the probability of parameter inclusion directly with a mixture-modeling approach. In a Bayesian context, this is easily achieved by adding additional parameters to the model that essentially flag the inclusion or exclusion of each effect parameter. For example, consider the simple four-parameter linear regression model:
where y is the dependent variable, x1-3 are the independent variables, β1 is the intercept and β2-4 are effect parameters for the independent variables. The information theoretic approach (Burnham and Anderson 2002) of determining if the effects of x1-3 on y are supported by the data is to compare competing models with AIC (or other criterion) both including and excluding parameters β2-4. The conclusion is then drawn that the parameters included in the most supported model are supported by the data. Alternatively, the Bayesian mixture model approach is to include three additional parameters in the linear model as:
where the w parameters are Bernoulli outcomes with prior probabilities of 0.5 (i.e. no prior information on the inclusion of exclusion of parameters). For each Markov chain Monte Carlo (MCMC) iteration, each w parameter can either have the value of one or zero. When the value of w is one, its associated β parameter is included in the model. When the value of w is zero, its associated β parameter is excluded from the model. Thus, the mean of the posterior samples of the w parameters is the probability of parameter inclusion in the model and a measure of support for that parameter. Furthermore, you can easily calculate the model probabilities directly from the posterior samples of w. For example, the proportion of MCMC iterations that resulted in w1-3 equal to the vector c(1,1,1) is the probability of the full model. The proportion of MCMC iterations that result in the w1-3 vector of c(0,1,0) is the probability of the model that only includes the intercept (β1) and the β3 parameter. Thus, inference can be drawn from either the parameter inclusion probabilities and/or the model probabilities.
The advantage of basing inference on the parameter inclusion probabilities is made most apparent for highly parameterized models with high model uncertainty. Consider a linear model with 20 parameters in addition to the intercept. You have determined, a priori, that all possible combinations of these parameters are plausible models. Thus, you aim to compare all possible models with AIC. Firstly, this could be a difficult task to accomplish because 20 parameters equates to 1,048,576 (i.e. 220) possible models! Unless you are a pretty clever coder, calculating AIC for each model may not be possible. Secondly, if there is uncertainty in the inclusion of many of the parameters in the best model, your AIC table of the top 10 or even 100 models may have model probabilities that are nearly indistinguishable. It becomes difficult to draw clean inferences in these situations. Alternatively, evaluating a 20-parameter model with the Bayesian mixture-modeling approach is quite simple because the focus can be transferred from 220 model probabilities to merely 20 parameter inclusion probabilities. This is a huge advantage alone, but the real advantage is the clarity of inference even in the face of high model uncertainty. Inference is simply drawn by looking to the parameters with high inclusion probability.
For example, to the right are two tables from a recent preliminary analysis aimed to determine the effectiveness of fisheries stock enhancement in German lakes. The full model contained 17 possible covariates which equates to 131,072 plausible models. Table 2 (top table) shows the model probabilities of the top 15 models. Notice how there is little contrast in the probability of the top 15 models. To the contrary, the 15 models all have very low and similar probabilities. This is due to high model uncertainty association with uncertainty in the inclusion of many of the parameters. Drawing clear inference from this table is difficult and may be impossible.
Alternatively, table 3 (bottom table) reports the inclusion probabilities of each parameter. It is simple to draw clear inference from this table by looking to the parameters with the highest probability. These are the parameters that are most supported by the data. Furthermore, it is easy to identify which parameters are driving the high model uncertainty. The most uncertain parameters are those that have values closest to 0.5. Parameter inclusion probabilities of zero or one indicate perfect certainty that the model should exclude or include the parameter, respectively.
The following is an example of the model coded in JAGS (Plummer 2003) and run with program R (R Development Core Team 2010) using the package rjags (http://mcmc-jags.sourceforge.net). The code first simulates data and then estimates the parameter inclusion probabilities using the mixture-modeling approach. To run this code you will need to install R, JAGS, and install the R package, rjags.
Hope you enjoy!
#begin code library(rjags) library(car) ################################################################################ # this part simulates data ################################################################################ n=40 #sample size std = 1 #residual error int = 1 #intercept beta1 = .4 #four effect parameters beta2 = 1 beta3 = .3 beta4 = 0 x1 =rnorm(n,0,1) #simulates four covariates x2 =rnorm(n,0,1) x3 =rnorm(n,0,1) x4 =rnorm(n,0,1) yobs = int + beta1*x1 + beta2*x2 + beta3*x3 + beta4*x4 + rnorm(n,0,std) ############################################################################### # Bayesian approach ############################################################################### modelFilename = "modselect" cat(" model { #Priors for inclusion parameters for(i in 1:4){w[i]~dbern(.5)} #Priors of other parameters tau <- 1/(sig*sig) sig~dgamma(.01,.01) int ~ dnorm(0,.001) for(i in 1:4){bet[i] ~ dnorm(0,.001)} #The model for(i in 1:n){ ypred[i] <- int + w[1]*bet[1]*x1[i] + w[2]*bet[2]*x2[i] + w[3]*bet[3]*x3[i] + w[4]*bet[4]*x4[i] yobs[i] ~ dnorm(ypred[i],tau) } } ", fill=TRUE, file=modelFilename) #file.show(modelFilename) jags.data = list(yobs = yobs, x1=x1, x2=x2, x3=x3, x4=x4, n=length(yobs)) jags.parms = c('bet','w') jmod = jags.model(file=modelFilename, data=jags.data, n.chains=3, n.adapt=1000) #inits = jags.inits(), update(jmod, n.iter=1000, by=10, progress.bar='text') post = coda.samples(jmod, jags.parms, n.iter=10000, thin=10) mypost = as.matrix(post, chain=F) head(mypost,1) bet=mypost[,1:4] w = mypost[,5:8] #------------------------------------------------------------------------------# # model probs #------------------------------------------------------------------------------# models = as.matrix(table(paste(w[,1],w[,2],w[,3],w[,4], sep="")))/dim(w)[[1]] #This identifies the top models and their probabilities modprobs = as.matrix(as.matrix(models[order(models[,1],decreasing = T),] )[1:min(5,dim(models)[[1]]),]) #------------------------------------------------------------------------------# # Summary statistics with parameter inclusion probabilities # Parameters summaries are model averaged #------------------------------------------------------------------------------# parmas = bet*w t(rbind(mean=colMeans(parmas), sd=apply(parmas,2,sd), apply(parmas,2,quantile,c(0.025,0.975),na.rm=T), inclusion_probs=colMeans(w))) modprobs #end code