About this document
This text corresponds to the slides of the Module 1 of the Simulation-based inference workshop held in 2021. Head on to the workshop page for the rest of the content.
🄯 Álvaro Tejero-Cantero for all the text, licensed under a CC-BY-SA license.
ℹ️ Practical parts are indicated in red background. Speaker's notes are under ▸ §.
Simulators for science
Models in science
- making models is part of the scientific method
- models capture only some aspects of reality
- when formalized, they enable quantitative, testable hypotheses
- model functionalities
- prediction — to support decisions
- understanding — to select interventions
- the structure that doesn't change is the model
- the malleable part are parameters
- parameters are 'tuned' based on observations
- multiple input parameter sets can lead to the same output prediction
- equifinality, degeneracy are key to resilience, homeostasis of complex systems
Simple pendulum
$\frac{{\rm d}^2 \theta}{{\rm d} t^2} + \frac{g}{\ell}\, \sin\theta = 0$- Can predict angles $\theta(t)$ given $g/\ell$ and $\theta(0)$.
- Can infer $g/\ell$ from measured $\theta(t)$.
- for small amplitudes $\sin\theta\simeq\theta$ and timing one oscillation $T=2\pi \sqrt{\ell/g}$ approximately suffices to infer $g/\ell$ → $T$ summarises $\theta(t)$ for inference.
- And extract understading wrt. interventions and counterfactuals.
Simulators, everywhere
- 20th-century: explosion of digital, expressive simulators → "complex science"
- "simulator as numerical solver" for an explicit model (e.g. PDEs) — based on discretization
- "simulator as defined by code", an implicit model built from individual interaction rules
- and anything in-between, e.g. pulse-coupled NNs: discrete + continuous dynamics
Simulators galore
But what are simulators?
- simulate - /ˈsɪm·jəˌleɪt/ (verb). (Cambridge English dictionary)
- to produce a situation or event that seems real but is not real, especially in order to help people learn how to deal with such situations or events
- simulation might be a key ingredient of cognitive processing? cf. predictive coding. (imagination, speculation, platonic shadow)
- typology: continuous vs. discrete (regular vs. event-based), dynamic vs. steady-state, deterministic vs. stochastic...
- Let's look at a couple of examples
1. create conditions or processes similar to something that exists.
Agent-based models
Microscale models "defined by code". Agents represented directly, not by density or concentration; possess internal state, interaction rules, and learning processes that determine state updates; they live in a topology embedded in an environment.
Other examples: culture modelling, tumor growth, epidemics, ecology, traffic, percolation (oil through soil), voting...
Differential-equation-based models
- continuous models for dynamically changing phenomena
- relate system response to infinitesimal changes of state variable(s)
- classes of Diferential equation (DE):
- Solution via integration is rarely possible in closed form
ODE: ordinary DE: one variable $y$, $F\left (x,y,y',\ldots, y^{(n-1)} \right )=y^{(n)}$ PDE: partial DE: multiple variables, e.g. time and space SDE: stochastic: DE with stochastic fluctuations (→ stochastic process)
→ numerical approximations: discretization.
Simulator example: core-collapse supernova models
Goal. Test understanding of physics. What are the mechanisms that make SN explode?
Model. Euler eqs. + eq. of state + conservation of mass (1), momentum (2), energy (3)
Observables. lightcurves, neutrinos, EM spectrum (🤯 SN remnants visible for centuries, but relevant dynamics on $\lesssim 1\,{\rm s}$)
Solution.
- PDEs → Finite Elements Model (FEM)
- "parameter-free"
- months on high-performance computers
Supernova simulation 400ms post-bounce, courtesy Thomas Janka →
Epistemic ambition of simulators
🖍️ Theory building — focus on process
- Formalizing heuristics for understanding, running simulator to generate hypotheses.
- Use of reduced models to understand dynamical degrees of freedom and interactions.
🖥️ Hypothesis testing — focus on result
Expected to quantitatively reproduce empirical measurements. Our workshop.
Problem: model misspecification
"Explanation requires reduction", all models are misspecified.
Parsimony (modelling "from the null up", Premo, 2007) vs. ommitted variable bias.
Interlude: introduce your simulator
Share your simulator-based research with the group. You can re-use slides you have. Paste them here in this Google presentation.
- Key features of your use case (use up to 2 minutes):
- why: phenomenon and scientific meaning of parameters and observations. Do you expect degeneracies / equifinality? what would they mean?
- what: type, dimensionality and structure of parameters and observations, sources of noise
- how: type of simulator (ODE, ABM...), programming language
- when: what is the approximate runtime?
- questions from the audience (about 1m).
Simulators vs. statistical models
Models, and models
Simulations are models (often stochastic), but what about 'statistical models'. Differences?
Models can either be mechanistic or convenient (Ripley 1987, Stochastic Simulation p3).
Statistical models
"Accommodate and fit data"
Statistical relationship, correlation
Interpretable post-hoc, if at all
Poor performance outside of training set
Less opinionated — needs much data
Built for inference ('fit')
Mechanistic models (many simulators)
"Formalize principles and hypotheses".
I→O mechanistic link, causal if time involved
Interpretable by design
Extrapolate well (if validated)
More opinionated — works with little data
Inference ❓
Example: modelling counts vs. modelling processes
RNN model of infection counts
Designed to fit data and predict
- RNN flexible approximator,
- specific architecture to fit time correlations
- many, $\mathcal{O}(10^4)$, opaque parameters
- model hard to debug and interpret
- data-hungry,
- poor extrapolation
Epidemic compartmental model
Designed for insight
- ODE, limited modelling repertoire
- use rate constants for population kinetics
- few, $\mathcal{O}(1)$ interpretable parameters
- expresses interaction rules, conservation laws, smoothness constraints.
- misspecification risk, but know how to formulate hypotheses as compartments
Induction vs. deduction, an age-old dichotomy
Data reduction vs. theory making. Philosophy of science as a guide.
- Induction $\sim$ statistical models: from experimental data, infer the simplest (usually statistical) laws that account best for the data.
- Deduction $\sim$ mechanistic models: from first principles, obtain laws that produce predictions in specific settings. Experimental design seeks measurements that falsify the laws.
Inference on simulators
An inverse problem
- from parameters to measurements: modelization, simulation: forward problem
- from measurements to parameters: parameter estimation: inverse problem
- point estimates via optimization
- grid search (not scalable)
- derivative-based optimization (gradients generally not available)
- population methods: evolutionary strategies
- share: no principled ranking of point estimates
Multiple and uncertain parameter sets
- functions destroy information, when they are noninjective (underdetermination, equifinality)
- $\hat{\theta}$ replicates observation $x_{\rm o}$, i.e. $x_{\rm o} = \operatorname{sim} \hat{\theta}$. But what about $\operatorname{sim}(\hat{\theta} + \epsilon)$
- this could be a serious problem
- sensitivity analysis is the discipline dealing with the $\operatorname{sim} (\hat{\theta}+\epsilon)$ problem
- solution? estimate $\hat{\theta}(x)$, calculate confidence sets
- uncertainty estimate ✔️
- find the multiple $\hat{\theta}$ which could have generated $x_{\rm o}$ (and rank them) ❌
Bayesian parameter inference
uncertainty: getting $\theta$ might be only one step in an inference or decision pipeline.
degeneracy: ignoring alternate $\theta$ leading to the same $x_{\rm o}$ discards alternative explanations upfront.
A full posterior parameter distribution conditioned on the observation, $p(\theta|x_{\rm o})$ provides both.
- But how to get there?
- And what to do with them?
simulator data + observation + inference 🪄 ($\sim$day 2 of the workshop)→ posterior
posterior + validation / interpretation ($\sim$ day 3 of the workshop) 🪄 → hypotheses, results
Introducing the posterior
For $\theta \in \mathbb{R}^{D>2}$ we need strategies for representation. Ground truth $\orange{\bullet}\equiv\{\orange{\theta_1, \theta_2}\}$
- marginal posteriors in 2D 🔨 integrate out $D-2$ parameter dimensions
$p(\theta_1, \theta_2|x_{\rm o})=\green{\int} p(\theta_1,\theta_2|x_{\rm o}, \green{\theta_{3:31}})\ \green{{\rm d}\theta_{3:31}}$
typically more like a blob!
Posterior plots courtesy Michael Deistler →
- conditioned posteriors in two dimensions
🏖️ fix $D - 2$ parameter dimensions
$p(\theta_1,\theta_2|x_{\rm o}, \color{green}{\theta_{3:31}\leftarrow\theta^\ast})$
typically sharper!
Interlude: Bayesian inference
The Bayesian workflow
Assume we want to know $\theta$ and have measured data on $x$.
- Modeling (day 1): formulate joint pdf $p(x,\theta)$
- product rule: $p(x,\theta) = p(x|\theta) p(\theta)=p(\theta|x)p(x)$.
- prior $p(\theta)$ summarizes our knowledge of the parameters (e.g. 'must be positive')
- likelihood $p(x|\theta)$ embodies our knowledge of how $x$ is generated from $\theta$ (→ model)
- rinse and repeat (update the prior: the posterior is the new prior)
- Inference (day 2): Use the data $x_{1:N}$ to learn about the target variable $\theta$ by conditioning
- Typically, use $x_{1:N}$ to fix $\phi$ in $p_\phi(\theta|x)$
- Multiple computational approaches, depending on the problem
- Validation and interpretation (day 3)
- check internal consistency of the posterior (SBC, posterior-predictive checks)
- check consistency with domain knowledge
See: Bayesian Worklflow, Gelman et al. 2020 (https://arxiv.org/abs/2011.01808).
Bayesian inference
$\color{0B6E99}{p(\theta|x)} =\frac{ \color{E03E3E}{p (x|\theta)}\color{6940A5}{p(\theta)}}{\color{9B9A97}{\int_\theta p(x|\theta)\ p(\theta)}}$The Iikelihood (model) times the prior divided by the evidence yields the posterior.
- For very few models this equation yields a posterior in closed form.
- a closed-form posterior can be sampled from and evaluated. Ideal case!
- Usually the evidence integral is hard to compute
- Markov-Chain Monte-Carlo (stochastic, asymptotically exact), works with unnormalized joint. Implicit $\color{0B6E99}{p(\theta|x)}$: low bias / high-variance (slow). Only samples!
- Variational inference (approximate, deterministic). Assume explicit parametric posterior, fast.
Simulators as statistical models
Stochasticity and simulators
- Typically stochasticity models all the processes where we lack specific mechanistic hypotheses
- unless the process is itself stochastic at the physical level, e.g. $\beta$ decay in a nucleus.
- What are sources of stochasticity?
- unobserved latent variables
- stochastic program paths
- instrument noise (aleatoric)
- numerical approximations (→ probabilistic numerics)
→ Outputs must then be stochastic themselves - random variables → probabilities!
Since simulators have a stochastic component, could we treat simulators as statistical models?
Simulator as generative models
$\color{0B6E99}{p(\theta|x)} =\frac{ \color{E03E3E}{p (x|\theta)}\color{6940A5}{p(\theta)}}{\color{9B9A97}{\int_\theta p(x|\theta)\ p(\theta)}}$- $\color{E03E3E}p(x|\theta)$ usually not evaluatable for simulators, as not built for inference.
- generative models build a probability density over samples — often an implicit one! (e.g. GANs)
- simulators are generative models with implicit likelihood. Most general scenario for inference - inference just from samples - likelihood-free inference (LFI).
- let's look at simulators from a probabilistic perspective
Anatomy of a simulator: notation
Probabilistic programs are programs with stochasticity that are interpreted as statistical models. In this sense, simulators are probabilistic programs, computer programs that
- take parameter vector as input, $\theta$ — we treat $\theta$ as a random variable, and assign it prior $p(\theta)$
- compute internal states — latent variables $z_\ell \sim p_\ell(z_\ell|\theta,z_{<\ell})$
- produce result vectors $x$ comparable with experimental observations $x_{\rm o}\sim p(x|\theta_{\rm true})$
Since simulators have often a stochastic component and often no explicit functional form, we can borrow 'sampling' notation, $x \sim p(x|\theta,z)$.
Some notes about the parameters $\theta$ → latents $z_{1:L}$ → outputs $x$
- $\theta$ fixed dimensionality, typically no structure
- $x$ can be structured (e.g. images, graphs...), and high-dimensional
- $z$ correspond to meaningful states, but are typically unobservable.
- continuous or discrete, changing dimensionality (even during simulation)
- updated deterministically, or stochastically
- we will NOT infer $z$ here
- but the existence of this unobserved state is problematic...
Latent-path intractability
Let's talk about $\color{E03E3E}{p(x|\theta)}$. How can a simulator become intractable, beyond lack of normalization?
$\begin{align*}p(x|\theta) &= \int \color{orange}{p(x,z|\theta)}\, \mathrm{d}z \quad \color{grey}\textsf{(sum over execution traces)}\\ \color{orange}{p(x,z|\theta)} &= p(x|\theta,z)\prod_{1:L} p_\ell(z_\ell|\theta, z_{<\ell})\quad \color{grey}{\textsf{(if sequential gen.)}} \end{align*}$We seek likelihood-free techniques to free us from latent-path intractability; ideally they'd also solve the normalization problem.
Partial observability in the field
Another name is partial observability. Here's a case from epidemiology (Kulkarni et al. 2021, discrete compartmental model)
Ways to make likelihoods tractable
- model reduction / coarsening. different model / limits expressivity, model-specific ❌
- likelihood data augmentation. model-specific, iterative ❌
- martingales. elegant ✔️ few specific models, hard ❌
- don't? likelihood-free inference: generally applicable, can target posterior✔️ performance❓, cutting edge❓
(...) introduce additional parameters $\psi$, which represent missing data, in such a way that the likelihood $p(x,\psi|\theta)$ is tractable. Inference then proceeds by estimating both $\theta$ and $\psi$, typically via EM or MCMC. (after O'Neill 2010)
Levels of simulator access
Think: how much do we know about a simulator? i.e. mathematical model, code accessible, run-time accessible, gradients and other quantities...
Your simulator access
Analytic Iikelihood (w. gradients maybe)
Analytic conditional likelihood $\color{orange}{p(x|z,\theta)}$
import sim
(source code)
x = GET(θ)
(just samples)
Inference strategy
Variational inference, MCMC (use gradients)
Data augmentation, martingales, augmented LFI
Source-level AD (LLVM, Julia) $+$ augmented LFI
Probabilistic programming (inference compilation)
▶️Bare likelihood-free inference◀️ our workshop
See Cranmer et al. 2020 about LFI augmentation strategies for improved sampling efficiency.
See PyProb for inference compilation, Madminer for augmented LFI in HEP.
Simulation-based inference
LFI, ABC, SBI, FBI, CIA, WHO?
LFI — Likelihood-free inference. Any technique not requiring likelihoods
ABC Approximate Bayesian Computation — population genetics (Tavaré et al. 1997); use of sampling, implicit posteriors.
SBI SampleSimulation-based inference — 💡 non ABC LFI. Modern LFI, often using NNs.
Let's have a hands-on look in a simplified case.
Interlude: from ABC to SBI
Open notebook for a first practical contact with ABC/SBI.
Conditional density estimation
Amortized SBI is density estimation
Amortization is sharing parameters across models for different predictions.
Conditional density estimation essentially provides us with amortized sbi.
With flexible neural networks we can estimate
- the likelihood (emulation). Prior-independent, but need still MCMC, VI for the evidence.
- the posterior directly.
Advantages derived from using networks
- Feature learning, can incorporate inductive biases
- Scales well with more data
- Interpolation properties
- Differentiable
Further materials
- Slides for the next sessions are available on GitHub
Resources
- Review. Cranmer et al. (2020). The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48), 30055–30062.
- Software. Tejero-Cantero et al., (2020). sbi: A toolkit for simulation-based inference. Journal of Open Source Software, 5(52), 2505 — 🧑🏽💻 https://github.com/mackelab/sbi
- Application (neuroscience). Gonçalves et al. (2020). Training deep neural density estimators to identify mechanistic models of neural dynamics. ELife, 9, e56261.