Bayesian Inference with Stan
Bayesian Inference with Stan
So, you want to learn how to use Stan? Maybe you have some experience using BUGS or JAGS and are looking to make the switch to Stan. Maybe you’ve been using the nicely packaged RStanARM or BRMS for Stan models and you want to learn how to code in raw Stan. Maybe you’ve never used Bayesian analysis and are looking to dive in head first! In any case, I wrote this tutorial because I wanted to get better at programming in Stan.
I started out my Bayesian career using JAGS and quickly switched to using RStanARM and BRMS because of their power and convenience. But something just didn’t seem right about not being proficient in using raw Stan, so I set out to write this tutorial series.
I searched around for a long time for a good tutorial series on using Stan, but found that most are either too advanced or lack the explanations needed to truly understand what is going on. My goal is to rectify these issues. My primary focus, however on teaching you how to program in Stan and not to teach you about Bayesian methods. I think the tutorial will teach you something about Bayesian methods, but that’s just a bonus.
In this tutorial, I will use RStan and CmdStanR to interface with Stan through R and will also make frequent use of the tidyverse. I won’t be teaching any R code, so I will assume that you have some basic understanding of how to use R and the tidyverse packages.
What is Stan?
Stan is a programming language for Bayesian inference that uses Hamiltonian Monte Carlo sampling. Hamiltonian Monte Carlo (HMC) sampling uses on gradian evaluation to sample from the posterior which is much more efficient than other sampling methods like Metropolis-Hastings and Gibbs sampling. As a result, HMC can achieve convergence much faster than these alternative samplers.
Modules in this tutorial
-
Introduction to Stan Syntax
Bayesian Workflow In general the Bayesian workflow consists of steps: Consider the social process that generates your data. The goal of your statistical model should be to model the data generating process, so think hard about this.
-
Building Linear Models
In this tutorial, we will learn how to estimate linear models using Stan and R. Along the way, we will review the steps in a sound Bayesian workflow. This workflow consists of:
-
Bernoulli and Binomial Models
In the last tutorial, we learned how to program and estimate linear models in Stan. In this tutorial, we’ll learn how to estimate binary outcome models commonly referred to as Logit and Probit.
-
Pooling, No Pooling, and Partial Pooling
There are a few different ways to model data that contains repeated observations for units over time, or that is nested within groups. First, we could simply pool all the data together and ignore the nested structure (pooling).