Let me tell you about the most frustrating part of Bayesian modeling. Often the first models you build either make bad assumptions or contain bugs. Both can cause the already expensive step of drawing posterior samples using Markov Chain Monte Carlo (MCMC) to become unbearably slow, but those samples are often the best way to check if our model makes sense. So we draw samples, encode some more better assumptions, draw more samples, fix some bugs, draw more samples, go check our error model against the laboratory equipment, rinse and repeat. And gradually we move toward higher quality, more useful models, which often can be sampled much faster. This is known as the folk theorem of statistical computing.
Failing faster (and succeeding too)
But in the earliest stages of model building, we waste so much time waiting to get a few poor-quality samples whose only use is give us a hint in what way our model is bad. This would all be so much faster if we had more efficient approaches for diagnosing problems with our models than MCMC and if we could speed up MCMC itself.
Enter Pathfinder. Pathfinder is an approximate inference method introduced by Lu Zhang and colleagues. They compared it with automatic differentiation variational inference (ADVI) and the first tuning phase of Hamiltonian Monte Carlo (HMC) used in Stan and many other probabilistic programming languages (PPLs) and found that it performed comparably or better than these alternatives but with many fewer evaluations of the log-density function and gradient. The authors speculated Pathfinder could replace more of the tuning phase for accelerating HMC.
We have a great, highly customizable Pathfinder implementation, so, together with Lu, we set out to test this. Specifically, we wondered if we could use Pathfinder to 1) diagnose common problems with models and 2) accelerate HMC sampling. We presented the results in a poster at BayesComp 2023 in beautiful Levi, Finland. Here is a brief summary of our findings.
The good, the bad, and the ugly
We first picked 3 representative models. We’ll call them the good (arma-arma11
, easily sampled), the bad (diamonds-diamonds
, hard to sample due to correlations), and the ugly (eight_schools-eight_schools_centered
, hard to sample due to a non-concave funnel geometry). We found that if we replaced the first 2 phases of HMC tuning as implemented in most PPLs with variants of Pathfinder that used different default parameters, then we got better quality draws faster. Most time in HMC tuning is spent in these 2 phases, so this translated in some cases to significant speed-ups. For the bad model, it reduced wait time from ~10 minutes to 1 minute!
A useful paramedic
We also checked if we could just run Pathfinder and diagnose issues with these models without ever running MCMC. We found that Pathfinder gave us answers to the following common modeling questions very quickly:
- Do the posterior variances of the parameters have very different scales? Then rescale the parameters.
- Is a dense metric better than diagonal? Then change the HMC configuration.
- Should the model be reparameterized/Are there any odd features? Then rethink the model assumptions.
We’re eager to see this repeated for more models. Now that Stan includes Pathfinder, we expect others will be investigating these questions as well and are excited for more scientists to be building better models faster.
Check out Pathfinder.jl: early diagnostics for probabilistic models and faster MCMC warmup for some exciting ways we have seen others using Pathfinder.jl.