Linear Models in Stan

In this tutorial, we will learn how to estimate linear models using Stan and R. Along the way, we will review the steps in a sound Bayesian workflow. This workflow consists of:

Considering the social process that generates your data. The goal of your statistical model should be to model the data generating process, so think hard about this. Exploratory analysis goes a long way towards helping you to understand this process.
Program your statistical model and sample from it.
Evaluate your model’s reliability. Check for Markov chain convergence to make sure that your model has produced reliable estimates.
Evaluate your model’s performance. How well does your model approximate the data generating process? This involves using posterior predictive checks.
Summarize your model’s results in tabular and graphical form.

My goal is to explain the fundamentals of linear models in Stan with examples so that we aren’t learning Stan programming in such an abstract environment. Let’s get started!

Import Data

We’ll use the Motor Trend Car Road Tests mtcars data that is provided in R as our practice data set.

cars.data <- mtcars

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

glimpse(cars.data)

## Rows: 32
## Columns: 11
## $ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
## $ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
## $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
## $ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
## $ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
## $ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

Let’s walk though what our variables are:

mpg: Miles pre gallon
cyl: Number of cylinders
disp Displacement (cu. in.)
hp: Gross horsepower
drat: Rear axle ratio
wt: Weight (1000 lbs)
qsec: 1/4 mile time
vs: Engine (0 = V-shaped, 1 = straight)
am: Transmission (0 = automatic, 1 = manual)
gear: Number of forward gears
carb: Number of carburetors

With this information, let’s do some quick data cleaning.

cars.data <- 
  cars.data %>%
  rename(cylinders = cyl,
         displacement = disp,
         rear_axle_ratio = drat,
         weight = wt,
         engine_type = vs,
         trans_type = am,
         gears = gear) %>%
  mutate(engine_type = factor(engine_type, levels = c(0, 1), 
                              labels = c("V-shaped", "Straight")),
         trans_type = factor(trans_type, levels = c(0, 1),
                             labels = c("Automatic", "Manual")))

glimpse(cars.data)

## Rows: 32
## Columns: 11
## $ mpg             <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, …
## $ cylinders       <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, …
## $ displacement    <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7…
## $ hp              <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 18…
## $ rear_axle_ratio <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, …
## $ weight          <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190…
## $ qsec            <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00…
## $ engine_type     <fct> V-shaped, V-shaped, Straight, Straight, V-shaped, Stra…
## $ trans_type      <fct> Manual, Manual, Manual, Automatic, Automatic, Automati…
## $ gears           <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, …
## $ carb            <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, …

For our research question, we will be investigating how different characteristics of a car affect it’s MPG. To start with, we will test how vehicle weight affects MPG. Let’s do some preliminary analysis of this question with visualizations.

ggplot(data = cars.data, aes(x = weight, y = mpg)) +
  geom_point()

As expected, there seems to be a negative relationship between these variables. Let’s add in a fitted line:

ggplot(data = cars.data, aes(x = weight, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

Build a Model

Models with a Single Predictor

Now we will build a model in Stan to formally estimate this relationship. Rather than creating an r code block, we want to create a stan code block. The only caveat is that we also need to add a name for our Stan model provided in the output.var argument.

Here is the model that we want to estimate in Stan:

$\begin{aligned} mpg & \sim Normal (μ, σ) \\ μ & = α + β weight \end{aligned}$

To start, we need to load cmdstanr which will allow us to interface with Stan through R.

library(cmdstanr)

## This is cmdstanr version 0.4.0

## - Online documentation and vignettes at mc-stan.org/cmdstanr

## - CmdStan path set to: /Users/nickjenkins/.cmdstanr/cmdstan-2.28.1

## - Use set_cmdstan_path() to change the path

## 
## A newer version of CmdStan is available. See ?install_cmdstan() to install it.
## To disable this check set option or environment variable CMDSTANR_NO_VER_CHECK=TRUE.

register_knitr_engine(override = FALSE) # This registers cmdstanr with knitr so that we can use
# it with R Markdown.

Now, program the model:

data {
  int n; //number of observations in the data
  vector[n] mpg; //vector of length n for the car's MPG
  vector[n] weight; //vector of length n for the car's weight
}

parameters {
  real alpha; //the intercept parameter
  real beta_w; //slope parameter for weight
  real sigma; //model variance parameter
}

model {
  //linear predictor mu
  vector[n] mu;
  
  //write the linear equation
  mu = alpha + beta_w * weight;
  
  //likelihood function
  mpg ~ normal(mu, sigma);
}

Once we finish writing the model, we need to run the code block to compile it into C++ code. This will also us to sample from the model and obtain the parameter estimates. Let’s do that now.

The next step is to prepare the data for Stan. Stan can’t use the same types of data that R can. For example, Stan requires lists, not data frames, and it cannot accept factors. We’ll use the tidybayes package to make it easier to prepare the data.

library(tidybayes)

model.data <- 
  cars.data %>%
  select(mpg, weight) %>%
  compose_data(.)

# sample from our model
linear.fit.1 <- linear.model$sample(data = model.data, refresh = 0)

## Running MCMC with 4 sequential chains...
## 
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.0 seconds.
## Chain 4 finished in 0.1 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.7 seconds.

# summarize our model
print(linear.fit.1)

##  variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
##    lp__   -52.26 -51.88 1.39 1.05 -54.91 -50.80 1.00     1145     1622
##    alpha   37.33  37.30 1.99 1.99  34.11  40.63 1.00     1079     1104
##    beta_w  -5.36  -5.35 0.59 0.59  -6.33  -4.41 1.00     1126     1149
##    sigma    3.20   3.14 0.46 0.42   2.57   4.00 1.00     1374     1236

Let’s run through the interpretation of this model:

alpha For a car with a weight of zero, the expected MPG is 37.21. Obviously, a weight of zero is impossible, so we’ll want to address this in our next model.
beta_w Comparing two cars who differ by 1000 pounds, the model predicts a difference of 5.33 miles per gallon.
sigma The model predicts MPG within 3.18 points.
lp__ Logarithm of the (unnormalized) posterior density. This log density can be used in various ways for model evaluation and comparison.

Ok, now that we’ve written our model, let’s make a few improvements. First, let’s center our weight variable so that we can get a more meaningful interpretation of the intercept parameter alpha. We can accomplish this by subtracting the mean from each observation. This will change the interpretation of the intercept to be the average MPG when weight is held constant at it’s average value.

cars.data <- 
  cars.data %>%
  mutate(weight_c = weight - mean(weight))

Now that we changed the name of the variable name, we also need to change our model code to incorporate this change. While we are adjusting the code, we’ll also restrict the scale parameter to be positive. This will help our model be a bit more efficient.

data {
  int n; //number of observations in the data
  vector[n] mpg; //vector of length n for the car's MPG
  vector[n] weight_c; //vector of length n for the car's weight
}

parameters {
  real alpha; //the intercept parameter
  real beta_w; //slope parameter for weight
  real<lower = 0> sigma; //variance parameter and restrict it to positive values
}

model {
  //linear predictor mu
  vector[n] mu;
  
  //write the linear equation
  mu = alpha + beta_w * weight_c;
  
  //likelihood function
  mpg ~ normal(mu, sigma);
}

Now prepare the data and re-estimate the model.

model.data <- 
  cars.data %>%
  select(mpg, weight_c) %>%
  compose_data(cars.data)

linear.fit.2 <- linear.model$sample(data = model.data, refresh = 0)

## Running MCMC with 4 sequential chains...
## 
## Chain 1 finished in 0.0 seconds.
## Chain 2 finished in 0.0 seconds.
## Chain 3 finished in 0.0 seconds.
## Chain 4 finished in 0.0 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.0 seconds.
## Total execution time: 0.6 seconds.

print(linear.fit.2)

##  variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
##    lp__   -51.06 -50.75 1.24 1.02 -53.53 -49.70 1.00     1681     2326
##    alpha   20.09  20.10 0.57 0.56  19.15  21.01 1.00     3359     2424
##    beta_w  -5.35  -5.36 0.57 0.56  -6.27  -4.40 1.00     3278     2625
##    sigma    3.18   3.13 0.43 0.40   2.56   3.95 1.00     3232     2957

Let’s interpret this model:

alpha When a vehicle’s weight is held at its average value, the expected MPG is 20.09.
beta_w This estimate has the same interpretation as before.
sigma This estimate has the same interpretation as before.
lp__ This estimate has a slightly lower value (in absolute value) than it did in the previous model indicating that this model performs slightly better.

Models with Multiple Predictors

To add more predictors, we just need to adjust out model code. Let’s add in the vehicle’s cylinders and horsepower. We’ll also center these variables.

data {
  int n; //number of observations in the data
  vector[n] mpg; //vector of length n for the car's MPG
  vector[n] weight_c; //vector of length n for the car's weight
  vector[n] cylinders_c; ////vector of length n for the car's cylinders
  vector[n] hp_c; //vector of length n for the car's horsepower
}

parameters {
  real alpha; //the intercept parameter
  real beta_w; //slope parameter for weight
  real beta_cyl; //slope parameter for cylinder
  real beta_hp; //slope parameter for horsepower
  real<lower = 0> sigma; //variance parameter and restrict it to positive values
}

model {
  //linear predictor mu
  vector[n] mu;
  
  //write the linear equation
  mu = alpha + beta_w * weight_c + beta_cyl * cylinders_c + beta_hp * hp_c;
  
  //likelihood function
  mpg ~ normal(mu, sigma);
}

Prepare the data and sample from the model.

model.data <- 
  cars.data %>%
  mutate(cylinders_c = cylinders - mean(cylinders),
         hp_c = hp - mean(hp)) %>%
  select(mpg, weight_c, cylinders_c, hp_c) %>%
  compose_data(.)

linear.fit.3 <- linear.model$sample(data = model.data, refresh = 0)

## Running MCMC with 4 sequential chains...
## 
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.1 seconds.
## Chain 4 finished in 0.1 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.9 seconds.

print(linear.fit.3)

##  variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
##  lp__     -45.28 -44.95 1.73 1.62 -48.59 -43.12 1.00     1428     2426
##  alpha     20.09  20.09 0.47 0.45  19.33  20.84 1.00     3536     2507
##  beta_w    -3.16  -3.16 0.80 0.81  -4.49  -1.85 1.00     3482     2566
##  beta_cyl  -0.96  -0.96 0.61 0.57  -1.98   0.04 1.00     2700     2218
##  beta_hp   -0.02  -0.02 0.01 0.01  -0.04   0.00 1.00     2892     2425
##  sigma      2.62   2.58 0.38 0.36   2.07   3.33 1.00     2546     2622

After adjusting for a car’s cylinders and horsepower two cars that differ by 1000 pounds, the model predicts a difference of 3.18 miles per gallon. Notice that lp__ is now even lower, suggesting that this latest model is a better fit.

Assesing Our Model

Model Convergence

Now that we’ve built a decent model, we need to see how well it actual preforms. First, we’ll want to check that our chains have converged and are producing reliable point estimates. We can do this with a traceplot.

library(bayesplot)

## This is bayesplot version 1.8.1

## - Online documentation and vignettes at mc-stan.org/bayesplot

## - bayesplot theme set to bayesplot::theme_default()

##    * Does _not_ affect other ggplot2 plots

##    * See ?bayesplot_theme_set for details on theme setting

fit.draws <- linear.fit.3$draws() # extract the posterior draws
mcmc_trace(fit.draws)

The fuzzy caterpillar appearance indicates that the chains are mixing well and have converged to a common distribution. We can also assess the Rhat values for each parameter. As a rule of thumb, Rhat values less than 1.05 indicate good convergence. The bayesplot package makes these calculations easy.

rhats <- rhat(linear.fit.3)
mcmc_rhat(rhats)

Effective Sample Size

The effective sample size estimates the number of independent draws from the posterior distribution of a given estimate. This metric is important because Markov chains can have autocorrelation which will lead to biased parameter estimates. With the bayesplot package we can visualize the ratio of the effective sample size to the total number of samples - the larger the ratio the better. The rule of thumb here is to worry about ratios less than 0.1.

eff.ratio <- neff_ratio(linear.fit.3)
eff.ratio

##     alpha    beta_w  beta_cyl   beta_hp     sigma 
## 0.8839743 0.8695231 0.6686701 0.7003035 0.6309044

mcmc_neff(eff.ratio)

We can also check the autocorrelation in the chains with bayesplot. To use the mcmc_acf function, we’ll need to extract the posterior draws from the model.

mcmc_acf(fit.draws)

Here, we are looking to see how quickly the autocorrelation drops to zero.

Posterior Predictive Checks

One of the most powerful tools of Bayesian inference is to conduct posterior predictive checks. This check is designed to see how well our model can generate data that matches observed data. If we built a good model, it should be able to generate new observations that very closely resemble the observed data.

In order to perform posterior predictive checks, we will need to add in some code to our model. Specifically we need to calculate replications of our outcome variable. We can do this using the generated quantities section.

data {
  int n; //number of observations in the data
  vector[n] mpg; //vector of length n for the car's MPG
  vector[n] weight_c; //vector of length n for the car's weight
  vector[n] cylinders_c; ////vector of length n for the car's cylinders
  vector[n] hp_c; //vector of length n for the car's horsepower
}

parameters {
  real alpha; //the intercept parameter
  real beta_w; //slope parameter for weight
  real beta_cyl; //slope parameter for cylinder
  real beta_hp; //slope parameter for horsepower
  real<lower = 0> sigma; //variance parameter and restrict it to positive values
}

model {
  //linear predictor mu
  vector[n] mu;
  
  //write the linear equation
  mu = alpha + beta_w * weight_c + beta_cyl * cylinders_c + beta_hp * hp_c;
  
  //likelihood function
  mpg ~ normal(mu, sigma);
}

generated quantities {
  //replications for the posterior predictive distribution
  real y_rep[n] = normal_rng(alpha + beta_w * weight_c + beta_cyl * 
    cylinders_c + beta_hp * hp_c, sigma);
}

In the code block above, normal_rng is the Stan function to generate observations from a normal distribution. So, y_rep generates new data points from a normal distribution using the linear model we built mu and a variance sigma. Now let’s re-estimate the model:

linear.fit.3 <- linear.model$sample(data = model.data, refresh = 0)

## Running MCMC with 4 sequential chains...
## 
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.1 seconds.
## Chain 4 finished in 0.1 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.9 seconds.

print(linear.fit.3)

##  variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
##  lp__     -45.27 -44.93 1.77 1.62 -48.58 -43.13 1.01     1340     1990
##  alpha     20.08  20.07 0.49 0.48  19.25  20.87 1.00     3018     2417
##  beta_w    -3.15  -3.16 0.77 0.76  -4.41  -1.88 1.00     2949     2618
##  beta_cyl  -0.95  -0.96 0.58 0.57  -1.89   0.00 1.00     2458     1928
##  beta_hp   -0.02  -0.02 0.01 0.01  -0.04   0.00 1.00     2763     2393
##  sigma      2.64   2.60 0.38 0.36   2.10   3.33 1.00     2681     2335
##  y_rep[1]  22.85  22.85 2.73 2.65  18.31  27.40 1.00     4020     4054
##  y_rep[2]  21.97  22.02 2.74 2.72  17.50  26.46 1.00     3903     3851
##  y_rep[3]  25.88  25.86 2.76 2.68  21.44  30.42 1.00     3879     3712
##  y_rep[4]  20.89  20.92 2.74 2.65  16.31  25.28 1.00     3798     3738
## 
##  # showing 10 of 38 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

In our model output, we now have a replicated y value for every row of data. We can use these values to plot the replicated data against the observed data.

y <- cars.data$mpg

# convert the cmdstanr fit to an rstan object
library(rstan)

## Loading required package: StanHeaders

## rstan (Version 2.21.3, GitRev: 2e1f913d3ca3)

## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)

## 
## Attaching package: 'rstan'

## The following object is masked from 'package:tidyr':
## 
##     extract

stanfit <- read_stan_csv(linear.fit.3$output_files())

# extract the fitted values
y.rep <- extract(stanfit)[["y_rep"]]

ppc_dens_overlay(y = cars.data$mpg, yrep = y.rep[1:100, ])

The closer the replicated values (yrep) get to the observed values (y) the more accurate the model. Here it looks like we could probably do a bit better, though the lose fit is likely due to the small sample size (which adds more uncertainty).

Improving the Model with Better Priors

To improve this model, let’s use more informative priors. Priors allow us to incorporate our background knowledge on the question into the model to produce more realistic estimates. For our question here, we probably don’t expect the weight of a vehicle to change its MPG more that a dozen or so miles per gallon. Unfortunately, because we didn’t specify priors in the previous models, it defaulted to using flat priors which essentially place an equal probably on all possible coefficient values - not very realistic. Let’s fix that.

To get a better sense of what priors to use, it’s a good idea to use prior predictive checks, which are a lot like posterior predictive checks only they don’t include any data. The goal is to select priors that put some probably over all plausible vales.

# expectations for the effect of weight on MPG
sample.weight <- rnorm(1000, mean = 0, sd = 100)
plot(density(sample.weight))

# expectations for the average mpg
sample.intercept <- rnorm(1000, mean = 0, sd = 100)
plot(density(sample.intercept))

# expectations for model variance
sample.sigma <- runif(1000, min = 0, max = 100)
plot(density(sample.weight))

# prior predictive simulation for mpg given the priors
prior_mpg <- rnorm(1000, sample.weight + sample.intercept, sample.sigma)
plot(density(prior_mpg))

These priors suggest that the effect of weight on MPG could be anywhere from -400 to 400 points. Definitely not realistic - and these are already more informative priors than any frequentist analysis! Similarly, the expected MPG of a vehicle given these priors is anywhere from -400 to 400.

Let’s bring these in a bit.

# expectations for the effect of weight on MPG
sample.weight <- rnorm(1000, mean = -10, sd = 5)
plot(density(sample.weight))

# expectations for the average mpg
sample.intercept <- rnorm(1000, mean = 20, sd = 5)
plot(density(sample.intercept))

# expectations for model variance
sample.sigma <- runif(1000, min = 0, max = 10)
plot(density(sample.sigma))

# prior predictive simulation for mpg given the priors
prior_mpg <- rnorm(1000, sample.weight + sample.intercept, sample.sigma)
plot(density(prior_mpg))

We could probably do better, but these look a lot better. Now the expected effect of weight on MPG is negative and and majority of the mass is concentrated between -15 and -5. Similarly, the expected MPG given these priors is between -10 and 20.

Now let’s build a new model with these priors and sample from it.

data {
  int n; //number of observations in the data
  vector[n] mpg; //vector of length n for the car's MPG
  vector[n] weight_c; //vector of length n for the car's weight
  vector[n] cylinders_c; ////vector of length n for the car's cylinders
  vector[n] hp_c; //vector of length n for the car's horsepower
}

parameters {
  real alpha; //the intercept parameter
  real beta_w; //slope parameter for weight
  real beta_cyl; //slope parameter for cylinder
  real beta_hp; //slope parameter for horsepower
  real<lower = 0> sigma; //variance parameter and restrict it to positive values
}

model {
  //linear predictor mu
  vector[n] mu;
  
  //write the linear equation
  mu = alpha + beta_w * weight_c + beta_cyl * cylinders_c + beta_hp * hp_c;
  
  //prior expectations
  alpha ~ normal(20, 5);
  beta_w ~ normal(-10, 5);
  beta_cyl ~ normal(0, 5); //we'll include my uncertain priors here
  beta_hp ~ normal(0, 5); //we'll include my uncertain priors here
  sigma ~ uniform(0, 10);
  
  //likelihood function
  mpg ~ normal(mu, sigma);
}

generated quantities {
  //replications for the posterior predictive distribution
  real y_rep[n] = normal_rng(alpha + beta_w * weight_c + beta_cyl * 
  cylinders_c + beta_hp * hp_c, sigma);
}

Now sample:

linear.fit.4 <- linear.model$sample(data = model.data)

## Running MCMC with 4 sequential chains...
## 
## Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
## Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
## Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
## Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
## Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
## Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
## Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
## Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
## Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
## Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
## Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
## Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
## Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
## Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
## Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
## Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
## Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
## Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
## Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
## Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
## Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
## Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
## Chain 1 finished in 0.1 seconds.
## Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
## Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
## Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
## Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
## Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
## Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
## Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
## Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
## Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
## Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
## Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
## Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
## Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
## Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
## Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
## Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
## Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
## Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
## Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
## Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
## Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
## Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
## Chain 2 finished in 0.2 seconds.
## Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
## Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
## Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
## Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
## Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
## Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
## Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
## Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
## Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
## Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
## Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
## Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
## Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
## Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
## Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
## Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
## Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
## Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
## Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
## Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
## Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
## Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
## Chain 3 finished in 0.2 seconds.
## Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
## Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
## Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
## Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
## Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
## Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
## Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
## Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
## Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
## Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
## Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
## Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
## Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
## Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
## Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
## Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
## Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
## Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
## Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
## Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
## Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
## Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
## Chain 4 finished in 0.2 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.2 seconds.
## Total execution time: 1.0 seconds.

print(linear.fit.4)

##  variable   mean median    sd  mad     q5    q95 rhat ess_bulk ess_tail
##  lp__     -48.56 -45.78 13.93 1.58 -51.03 -44.03 1.02      167       27
##  alpha     19.58  20.09  2.90 0.47  19.08  20.86 1.03      154       27
##  beta_w    -3.28  -3.33  0.92 0.74  -4.59  -2.00 1.01      476      116
##  beta_cyl  -0.89  -0.88  0.67 0.59  -1.90   0.11 1.01      619      137
##  beta_hp   -0.02  -0.02  0.01 0.01  -0.04   0.00 1.01     2127     1639
##  sigma      2.86   2.59  1.33 0.35   2.11   3.52 1.02      134       29
##  y_rep[1]  22.38  22.80  4.47 2.89  17.37  27.32 1.02      194       42
##  y_rep[2]  21.52  21.98  4.40 2.71  16.69  26.51 1.02      160       36
##  y_rep[3]  25.43  25.82  4.21 2.86  20.33  30.44 1.02      227       42
##  y_rep[4]  20.42  20.92  4.38 2.81  15.20  25.37 1.02      236       40
## 
##  # showing 10 of 38 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

After estimating the model with more informative priors, the lp__ is now a little bit lower. We can compare the prior distribution to the posterior distribution to see how “powerful” our priors are.

linear.fit.4 <- read_stan_csv(linear.fit.4$output_files())
posterior <- as.data.frame(linear.fit.4)

library(tidyverse)
ggplot() +
  geom_density(aes(x = sample.weight)) +
  geom_density(aes(x = posterior$beta_w), color = "blue")

Here we can see that even with more “informative priors” they are still very weak compared to the data.

Summarize the Model

With our final model in hand, we can visualizations of our model’s results.

Coeficient Plot

stan_plot(linear.fit.4, 
          pars = c("alpha", "beta_w", "beta_cyl", "beta_hp", "sigma"))

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

Fitted Regression Line

Here we can plot the fitted regression line

# Fitted Line
ggplot(data = cars.data, aes(x = weight_c, y = mpg)) +
  geom_point() +
  stat_function(fun = function(x) mean(posterior$alpha) + mean(posterior$beta_w) * x)

# Fitted Line with Uncertainty ------------------------------------------------
fit.plot <- 
  ggplot(data = cars.data, aes(x = weight_c, y = mpg)) +
  geom_point()

# select a random sample of 100 draws from the posterior distribution
sims <-
  posterior %>%
  mutate(n = row_number()) %>%
  sample_n(size = 100)

# add these draws to the plot
lines <- 
  purrr::map(1:100, function(i) stat_function(fun = function(x) sims[i, 1] + sims[i, 2] * x, 
             size = 0.08, color = "gray"))
fit.plot <- fit.plot + lines

# add the mean line to the plot
fit.plot <- 
  fit.plot +
  stat_function(fun = function(x) mean(posterior$alpha) + mean(posterior$beta_w) * x)

fit.plot

Last updated on January 1, 0001