Faster Bayesian inference with Pathfinder

Attachment

pathfinder_benchmarks_poster_bayescomp_2023.pdf

Author

Seth Axen

Date

March 6, 2024

Last edited time

Jun 5, 2024 10:32 AM

URL

Let me tell you about the most frustrating part of Bayesian modeling. Often the first models you build either make bad assumptions or contain bugs. Both can cause the already expensive step of drawing posterior samples using Markov Chain Monte Carlo (MCMC) to become unbearably slow, but those samples are often the best way to check if our model makes sense. So we draw samples, encode some more better assumptions, draw more samples, fix some bugs, draw more samples, go check our error model against the laboratory equipment, rinse and repeat. And gradually we move toward higher quality, more useful models, which often can be sampled much faster. This is known as the folk theorem of statistical computing.

https://twitter.com/ShenRaphael/status/1628264932486615042, adapted from https://xkcd.com/303/

Failing faster (and succeeding too)

The poster we presented at BayesComp 2023. To download a high-resolution PDF, click the link at the top of the page.

But in the earliest stages of model building, we waste so much time waiting to get a few poor-quality samples whose only use is give us a hint in what way our model is bad. This would all be so much faster if we had more efficient approaches for diagnosing problems with our models than MCMC and if we could speed up MCMC itself.

Enter Pathfinder. Pathfinder is an approximate inference method introduced by Lu Zhang and colleagues. They compared it with automatic differentiation variational inference (ADVI) and the first tuning phase of Hamiltonian Monte Carlo (HMC) used in Stan and many other probabilistic programming languages (PPLs) and found that it performed comparably or better than these alternatives but with many fewer evaluations of the log-density function and gradient. The authors speculated Pathfinder could replace more of the tuning phase for accelerating HMC.

We have a great, highly customizable Pathfinder implementation, so, together with Lu, we set out to test this. Specifically, we wondered if we could use Pathfinder to 1) diagnose common problems with models and 2) accelerate HMC sampling. We presented the results in a poster at BayesComp 2023 in beautiful Levi, Finland. Here is a brief summary of our findings.

The good, the bad, and the ugly

We first picked 3 representative models. We’ll call them the good (arma-arma11, easily sampled), the bad (diamonds-diamonds, hard to sample due to correlations), and the ugly (eight_schools-eight_schools_centered, hard to sample due to a non-concave funnel geometry). We found that if we replaced the first 2 phases of HMC tuning as implemented in most PPLs with variants of Pathfinder that used different default parameters, then we got better quality draws faster. Most time in HMC tuning is spent in these 2 phases, so this translated in some cases to significant speed-ups. For the bad model, it reduced wait time from ~10 minutes to 1 minute!

💡

Reality check. A number of choices must be made when implementing L-BFGS, the optimizer used by Pathfinder, such as the choice of line search, line search initialization, convergence checks, initialization of inverse Hessian approximation, etc. These decisions can have a major impact on performance. As far as we can tell, the Pathfinder paper used a wrapper of L-BFGS-B v2.1. For our benchmark, we used Optim.jl’s implementation of L-BFGS, because it’s more customizable. The two implementations are not numerically equivalent, which raises the question whether we would observe the same results with L-FBGS-B. Since presenting the poster, we have tested a wrapped version of L-BFGS-B v3.0 and found that Pathfinder performs much better when using it, but still some of the variants we benchmarked performed better on these models.

A useful paramedic

We also checked if we could just run Pathfinder and diagnose issues with these models without ever running MCMC. We found that Pathfinder gave us answers to the following common modeling questions very quickly:

Do the posterior variances of the parameters have very different scales? Then rescale the parameters.
Is a dense metric better than diagonal? Then change the HMC configuration.
Should the model be reparameterized/Are there any odd features? Then rethink the model assumptions.

We’re eager to see this repeated for more models. Now that Stan includes Pathfinder, we expect others will be investigating these questions as well and are excited for more scientists to be building better models faster.

📖

All code necessary to reproduce our results can be found at https://github.com/mlcolab/PathfinderBenchmarks.jl. Based on the results, we recently updated https://github.com/mlcolab/Pathfinder.jl to use Hager-Zhang’s line search and line search initialization.

Check out Pathfinder.jl: early diagnostics for probabilistic models and faster MCMC warmup for some exciting ways we have seen others using Pathfinder.jl.