Chapter 18 of DBDA2E includes sections on Bayesian variable selection in multiple linear regression. The idea is that each predictor (a.k.a., "variable") has an inclusion coefficient \(\delta_j\) that can be 0 or 1 (along with its regression coefficient, \(\beta_j\)). Each combination of predictors is a different "model" of the predicted variable. The plots in DBDA2E show the posterior probabilities of combinations of included variables, but the plots do not show the posterior distribution of \(R^2\) for each combination. This blog post shows plots with \(R^2\) and, in turn, emphasizes that the model with the highest \(R^2\) is not necessarily the model with the highest posterior probability. That preference for a parsimonious model --- not only the best fitting model --- is a signature of the automatic penalty for complexity provided by Bayesian model comparison.
In this example, the mean Scholastic Aptitude Test (SAT) score for each state of the U.S. is the predicted variable, and the four candidate predictors are spending per student (by the state), percentage of students taking the exam, student/teacher ratio, and teacher salary. Full details are provided in Chapter 18 of DBDA2E.
Here (below) are plots of the posterior distributions for the three most probable models, and the model with all predictors included. There are four rows, corresponding to four different combinations of included variables. The posterior probability of each combination is shown at the top-left of each panel, as "Model Prob".
Thanks to Dr. Renny Maduro for suggesting the inclusion of \(R^2\) in the plots. Dr. Maduro attended my recent workshop at ICPSR in Ann Arbor, Michigan.
Modified code for the graphs of this example is posted at the book's web site.