# Overview

Researchers such as archaeologists and anthropologists sometimes need estimates of climate at certain places and times in the distant past. But there’s no consensus model of past climate they can check.

Instead, there are a variety of *gridded simulations* from various climate models at specific time slices, e.g. 21 thousand years ago (the Last Glacial Maximum, LGM) or 6 thousand years ago (the mid-Holocene, MH).

There are also *proxy data*, data located at specific points in space and time (e.g. fossilized pollen granules from a slice of a lake sediment core). By comparing proxy records to modern observations, estimates of paleoclimate can be made at those specific points in space and time.

While this information is publicly available, it’s not all in one place, and the researcher is left to pick one or more of these resources and try to figure out how to put them together to make their specific estimate and then try to figure out how reliable their estimate is.

# Goal

We want to build a consensus model of mean annual temperature for global land mass from LGM to MH that:

- can produce estimates with uncertainties
- combines simulations and reconstructions from pollen proxies

# Team

We’ve assembled the dream team to tackle this project!

- Christian Sommer is a geographer who studies the role of landscape in early human expansion and is the archetypical user of this method.
- Nils Weitzel has worked extensively on statistical modeling of paleoclimate.
- Seth Axen, Alexandra Gessner, and Álvaro Tejero-Cantero from the Colab bring their expertise in statistical modeling and applied ML.

# Overview of Approach

A Gaussian process (GP) model would be ideal! A GP is a distribution of functions. A function here could be some program that given a set of coordinates and years can randomly sample climates.

In the GP framework, we put our assumptions about what kind of climate might make sense — ignoring the data — as a GP *prior*.

We then update that prior with an *observational model*, which here just handles the fact that both the proxy-based reconstructions and simulations will have errors, i.e. they will disagree with each other and with the actual climate of the past. We use a Gaussian observational model.

The magic of GPs is that with a GP prior and a Gaussian observational model, when we feed in our data, the GP prior gets updated to a GP *posterior*. So we can sample possible alternative climates from the GP or at any point in space and time get the mean and variance the climate variable consistent with all used information.

There’s just one problem. To actually do anything with the GP posterior, we need to perform an operation that scales cubically ($\mathcal{O}(N^3)$) with our number of data points $N$. For Europe alone, we have $>600$k data points, which is way more than is feasible.

But that’s where machine learning comes in! Instead of fitting our exact GP posterior, we train a *doubly-sparse variational* approximation to it. In short, we use some tricks to reduce the scaling to be *linear* with the number of data points!

For more details on the current approach and some exciting initial results, see our workshop poster, paper, and/or lightning talk below!