```
library(rstanarm)
<- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2019/02/elemapi2v2.csv")
d <- lm(api00 ~ enroll + meals + full, data = d)
fit <- stan_glm(api00 ~ enroll + meals + full, data = d, seed = 222) fitBayes
```

For many applied social science problems (for example relatively simple versions of linear regression, descriptive statistics where you still use a model, etc.) it doesn’t seem to matter much.

Frequentists use Maximum Likelihood Estimation (MLE) to obtain point estimates for parameters. Bayesians, on the other hand, combine the likelihood with a prior distribution using Bayes’ theorem to obtain a posterior distribution. It’s like taking a weighted average of the MLE estimate and the prior.

In social science, Bayesians often employ vague or not-very-informative priors. Consequently, the posterior distribution of their estimates tends to align closely with MLE estimates.

There are packages that do very easily the Bayesian versions of things like `lm()`

, `glm()`

, `lmer()`

and `glmer()`

(mostly `rstanarm`

and `brms`

).

The results are often similar to what you would get with `lm()`

, `glm()`

… Importantly, if the results are not similar, you should try to understand why. If one of the bootstrap or frequentist confidence interval, or the Bayesian credibility interval, is really wide or close to zero, then you simply don’t have a lot of good evidence.

In practice, I’ll often use frequentist methods because they are much faster on large datasets. Most people are trained with frequentist statistics so that’s also what they know.

I’ll sometimes use `rstanarm`

and `brms`

if I need the full posterior distribution of parameters because I want to get the downstream uncertainty; like poststratified predictions. We can also get these with bootstrap or the delta method. But there’s something nice with getting the full posterior.

My high level view is something like section 5.5.5 of Kevin Murphy’s book, especially the quote by Donald Rubin at the top of p. 203:

The applied statistician should be Bayesian in principle and calibrated to the real world in practice. [They] should attempt to use specifications that lead to approximately calibrated procedures under reasonable deviations from [their assumptions]. [They] should avoid models that are contradicted by observed data in relevant ways — frequency calculations for hypothetical replications can model a model’s adequacy and help to suggest more appropriate models.

Both for Frequentists and Bayesians, the parameter to estimate “may have been fixed from the start or may have been generated from a physically random mechanism”. The difference is that Bayesians will estimate a probability model for parameters given the model and their priors on these parameters, while Frequentists will estimate the most likely point estimate given the model and will use the standard error of the parameter to provides a measure of the uncertainty or variability associated with the estimate (e.g., if we replicated that experiment 100 times, the parameter true would be contained in the confidence interval 95 times). Seems to me to be a pragmatic choice.

Here is a toy example but for many applied problems it won’t matter much either:

`coef(summary(fit))[,1:2]`

```
Estimate Std. Error
(Intercept) 801.8298337 26.42660282
enroll -0.0514556 0.01383741
meals -3.6597345 0.10879550
full 1.0810944 0.23945269
```

`1:2] fitBayes[`

```
$coefficients
(Intercept) enroll meals full
801.96170047 -0.05139102 -3.65802369 1.08278020
$ses
(Intercept) enroll meals full
25.7709027 0.0138499 0.1080426 0.2365597
```

For example if we look at the coefficient on the intercept, it’s 801.83 for MLE and 801.96 for the Bayesian fit. The prior chosen automatically by `rstanarm`

for that coefficient is \(N(648,356)\). That’s what is meant by some kind of weighted average of the prior and the likelihood. As is the case with all applied regression models like this one, the likelihood strongly dominates the prior.

`prior_summary(fitBayes)`

```
Priors for model 'fitBayes'
------
Intercept (after predictors centered)
Specified prior:
~ normal(location = 648, scale = 2.5)
Adjusted prior:
~ normal(location = 648, scale = 356)
Coefficients
Specified prior:
~ normal(location = [0,0,0], scale = [2.5,2.5,2.5])
Adjusted prior:
~ normal(location = [0,0,0], scale = [ 1.57,11.14,23.79])
Auxiliary (sigma)
Specified prior:
~ exponential(rate = 1)
Adjusted prior:
~ exponential(rate = 0.007)
------
See help('prior_summary.stanreg') for more details
```

Of course you can also set an implausibly very precise prior. It’s not clear why you would do that…

```
<- stan_glm(
fitBayes ~ enroll + meals + full,
api00 data = d,
prior_intercept = normal(0, 1),
)
```

`1:2] fitBayes[`

```
$coefficients
(Intercept) enroll meals full
140.85916753 -0.04714781 -3.59955054 1.20609481
$ses
(Intercept) enroll meals full
306.7225878 0.1554933 1.2277531 2.7602487
```

One case where it’s sometimes said that Bayesian statistics is superior is noisy data: when studies are small and data are indirect or highly variable. Partial pooling adjusts individual group estimates based on the overall data. If a particular group has less data, its estimate will be pulled (or “shrunk”) towards the grand mean (or the overall average across groups). This is beneficial because, in cases where we have limited data, the overall average provides some information that can be more reliable than the limited data on its own. We will set structured priors and the model will fit. For example, in Canada, you could still get estimates for the territories in addition to the provinces (but with a much more uncertainty around your parameters) using a hierarchical model with partial pooling. Frequentists would often simply not say anything about such small areas like this where we don’t have a lot of information. It comes down to a difference in philosophy: do we want to model something and have a lot of uncertainty by pooling towards the grand mean, or do we simply want to refrain from saying anything.

There’s a good summary of all this on pages 14-16 of Regression and Other Stories:

So, in that sense, we have a choice: classical inference, leading to pure summaries of data which can have limited value as predictions; or Bayesian inference, which in theory can yield valid predictions even with weak data, but relies on additional assumptions. There is no universally correct answer here; we should just be aware of our options. There is also a practical advantage of the Bayesian approach, which is that all its inferences are probabilistic and thus can be represented by random simulations. For this reason, whenever we want to summarize uncertainty in estimation beyond simple confidence intervals, and whenever we want to use regression models for predictions, we go Bayesian. As we discuss in Chapter 9, we can perform Bayesian inference using noninformative or weakly informative priors and obtain results similar to classical estimates, along with simulation draws that can be used to express predictive uncertainty, or we can use informative priors if so desired. To the extent that we have relevant information that is not in our model (for example, awareness of bias, selection on unmeasured characteristics, prior information on effect sizes, etc), then we have a duty to account for this as well as we can when interpreting our data summaries.

Here is an example using the *2021 Canadian Election Study v1.0.dta* available here.

We look at the estimate of the mean of cps21_pos_carbon by province/territory.

cps21_pos_carbon To help reduce greenhouse gas emissions, the federal government should continue the carbon tax.

Strongly disagree (1)

Somewhat disagree (2)

Neither agree nor disagree (3)

Somewhat agree (4)

Strongly agree (5)

Don’t know/ Prefer not to answer (6)

Let’s remove (6). Let’s treat it as a continuous variable.

```
library(haven)
library(tidyverse)
library(srvyr)
<- read_stata("~/Downloads/dataverse_files_CES2021/2021 Canadian Election Study v1.0.dta")
df <- df |>
to_model select(cps21_ResponseId,cps21_pos_carbon,provcode,weight=cps21_weight_general_restricted) |>
# sketchy but weights for territories are all NA ...
mutate(weight=ifelse(is.na(weight),1,weight)) |>
drop_na() |>
filter(cps21_pos_carbon!=6)
$provcode <- as_factor(to_model$provcode)
to_model<- to_model %>%
results_survey_province as_survey_design(ids = 1, weight = weight) %>%
group_by(provcode) %>%
summarise(survey_mean(cps21_pos_carbon),n=n(),
survey_mean(cps21_pos_carbon,vartype = "ci"))
```

```
$provcode <- as.character(to_model$provcode)
to_model$cps21_pos_carbon <- as.numeric(to_model$cps21_pos_carbon)
to_model<- stan_glmer(cps21_pos_carbon~(1|provcode),
bayesian_fit family='gaussian',weights = weight,
to_model,cores=4,chains=4)
<- to_model |> select(provcode) |> unique()
to_predict <- apply(posterior_epred(bayesian_fit,newdata = to_predict),2,
pred function(x){quantile(x,c(0.025,0.5,0.975))})
$min <- pred[1,]
to_predict$estimate <- pred[2,]
to_predict$max <- pred[3,]
to_predict<- bind_rows(results_survey_province|>
to_plot select(provcode,
estimate=coef,
min=`_low`,
max=`_upp`)|>mutate(type='classical survey\nestimate'),
|>
to_predictselect(provcode,
|>mutate(type='Bayesian with\npartial pooling'))
estimate,min,max)|>
to_plot ggplot(aes(x=provcode,y=estimate,ymin=min,ymax=max,color=type,group=type)) +
geom_errorbar(position=position_dodge(width=.85)) +
geom_point(position=position_dodge(width=.85)) +
coord_flip()
```