Yuansi Chen

Localization Schemes: A Framework for Proving Mixing Bounds for MCMC Sampling Algorithms

Our work is motivated by two recent and seemingly-unrelated techniques for proving mixing bounds for Markov chains:

(i) the framework of spectral independence, introduced by Anari, Liu and Oveis Gharan, and its numerous extensions, which have given rise to several breakthroughs in the analysis of mixing times of discrete Markov chains and

(ii) the stochastic localization technique which has proven useful in establishing mixing and expansion bounds for both log-concave measures and for measures on the discrete hypercube.

In this work, we present a framework which connects ideas from both techniques and allows us to unify proofs in the mixing time of MCMC algorithms on high dimensional distributions. In its center is the concept of a localization scheme which, to every probability measure, assigns a martingale of probability measures which localize in space as time evolves. This viewpoint provides tools for deriving mixing bounds for the dynamics through the analysis of the corresponding localization process.  Generalizations of concepts of spectral independence naturally arise from our definitions. In particular we show via our framework that it is possible to recover the main theorems in the spectral independence frameworks via simple martingale arguments, while completely bypassing the theory of high-dimensional expanders.  As applications, we discuss how to use it to obtain the first O(nlogn) bound for mixing time of the hardcore-model (of arbitrary degree) in the tree-uniqueness regime, under Glauber dynamics and to prove a KL-divergence decay bound for log-concave sampling via the Restricted Gaussian Oracle, which achieves optimal mixing under any exp(n)-warm start.

Sinho Chewi

The Proximal Sampler

The proximal sampler, introduced by Lee, Shen, and Tian in 2021, is the sampling analogue of the proximal point method in optimization. The proximal point method, which requires access to the proximal map, enjoys convergence guarantees similar to those enjoyed by the continuous-time gradient flow; furthermore, when the objective function is smooth, then computation of the proximal map becomes a strongly convex optimization problem and can thus be implemented efficiently. Similarly, the proximal sampler assumes sample access to a distribution, called the restricted Gaussian oracle (RGO). In this work, we provide an improved analysis of the proximal sampler that operates under weaker assumptions than previous works and highlights its remarkable similarity with the continuous-time Langevin diffusion. When the target distribution is log-smooth, the RGO can be implemented efficiently, and it yields new state-of-the-art sampling guarantees for log-smooth targets which are weakly log-concave or satisfy a functional inequality. This is joint work with Yongxin Chen, Adil Salim, and Andre Wibisono.

Bamdad Hosseini

Transport Maps for Conditional Simulation in High Dimensions

MCMC algorithms are the gold standard for sampling of high-dimensional probability measures and in particular for Bayesian inference. Their convergence properties and accuracy have been the subject of intense

research over the past few decades. However, MCMC methods suffer from certain drawbacks in high-dimensional applications specially as it pertains to inverse problems. In this talk I will discuss an alternative approach to conditional simulation based on transport maps. Broadly speaking, these methods fall under the category of variational inference but are inspired by recent advances in machine learning and uncertainty quantification. I will present some foundational theory as well as numerical experiments that demonstrate the feasibility of these new ideas.

Sung-Ha Kang

Identifying Differential Equations with Numerical Methods: Time Evolution, Subspace Pursuit and Weak Form

We consider identifying underlying partial differential equation (PDE) from one set of noisy observation.  We assume that the governing PDE can be expressed as a linear combination of different linear and nonlinear differential terms.  In this talk, starting from numerical time evolution, we consider robust identification methods for various PDE and equations with nonlocal potential.  This talk may include statistical analysis and using weak form for identification.

Michael Lindsey

Thermal State Sampling for Numerical Linear Algebra

How do you estimate the diagonal of an implicitly defined matrix with only O(1) matrix-vector multiplications, or in O(n) time? The best existing randomized approaches (based on the randomized SVD and/or Hutchinson-type estimators) fail when both the rank of the matrix and the off-diagonal contributions grow with n. Even if the off-diagonal contribution remains bounded, it may yield a variance that is unacceptably high. For certain applications, sparse direct methods can be applied, though the overall scaling of such approaches is not O(n) in general.
We introduce a new tool for estimating positive definite matrix diagonals as well as, more generally, `thermal averages’ of the form Tr[AX], where X is positive definite. The algorithm is adapted from the minimally entangled typical thermal state (METTS) approach for solving finite-temperature quantum many-body problems at finite temperature. While this method was originally introduced in an exponentially high-dimensional setting where vectors cannot be stored explicitly, it can be simplified and improved in more classical numerical linear algebra settings.
We discuss the theory of this approach as well as applications from graph processing, computer graphics, electronic structure theory, and Gaussian process regression where this technique can improve the scaling of existing algorithms. We also introduce a new entropic regularization of semidefinite programming problems that can be solved efficiently with a stochastic optimization approach based on thermal state sampling.
Akihiko Nishimura

Prior-preconditioned Conjugate Gradient Method for Accelerated Gibbs Sampling in “large n & large p” Bayesian Sparse Regression

In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of 10^5 ~ 10^6 and of 10^4 ~ 10^5. Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian methods based on shrinkage priors. In the “large n & large p” setting, however, posterior computation encounters a major bottleneck at repeated sampling from a high-dimensional Gaussian distribution, whose precision matrix $\Phi$ is expensive to compute and factorize. In this talk, I will present a novel algorithm to speed up this bottleneck based on the following observation: we can cheaply generate a random vector $b$ such that the solution to the linear system $\Phi \beta = b$ has the desired Gaussian distribution. We can then solve the linear system by the conjugate gradient (CG) algorithm through matrix-vector multiplications by $\Phi$; this involves no explicit factorization or calculation of $\Phi$ itself. Rapid convergence of CG in this context is guaranteed by the theory of prior-preconditioning we develop. We apply our algorithm to a clinically relevant large-scale observational study with n = 72,489 patients and p = 22,175 clinical covariates, designed to assess the relative risk of adverse events from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up in posterior inference, in our case cutting the computation time from two weeks to less than a day.

Kui Ren

Revisiting Least-Squares Formulations for Computational Inverse Problems

The classical least-squares formulation has provided a successful framework for the computational solutions of inverse problems. In recent years, modifications and alternatives have been proposed to overcome some of the disadvantages of this classical formulation in dealing with new applications. We will present some recent progress on the understanding of a general weighted least-squares optimization framework in such a setting. The main part of the talk is based on joint works with Bjorn Engquist and Yunan Yang.

Grant Rotskoff

Learning High-dimensional Functions for Rare Events Sampling In and Out of Equilibrium

The surprising flexibility and undeniable empirical success of machine learning algorithms have inspired many theoretical explanations for the efficacy of neural networks. Here, I will briefly introduce one perspective that provides not only asymptotic guarantees of trainability and accuracy in high-dimensional learning problems but also provides some prescriptions and design principles for learning. Bolstered by the favorable scaling of these algorithms in high dimensional problems, I will turn to the problem of variational rare events calculations that arise in chemical physics. With neural networks in the toolkit, I will show it is possible to generate high-dimensional representations of free energies and a nonequilibrium analog of the free energy for dynamical observables. I will describe a class of algorithms that combines stochastic gradient descent with importance sampling to accurately determine these solutions.

Daniel Sanz-Alonso

Auto-differentiable Ensemble Kalman Filters

Data assimilation is concerned with sequentially estimating a temporally-evolving state. This task, which arises in a wide range of scientific and engineering applications, is particularly challenging when the state is high-dimensional and the state-space dynamics are unknown. In this talk I will introduce a machine learning framework for learning dynamical systems in data assimilation. Our auto-differentiable ensemble Kalman filters (AD-EnKFs) blend ensemble Kalman filters for state recovery with machine learning tools for learning the dynamics. In doing so, AD-EnKFs leverage the ability of ensemble Kalman filters to scale to high-dimensional states and the power of automatic differentiation to train high-dimensional surrogate models for the dynamics. Numerical results using the Lorenz-96 model show that AD-EnKFs outperform existing methods that use expectation-maximization or particle filters to merge data assimilation and machine learning. In addition, AD-EnKFs are easy to implement and require minimal tuning. This is joint work with Yuming Chen and Rebecca Willett.

Andrew Stuart

Bayesian Inference Using Surrogate Operators

Consider map F: U –> V. Given data pairs {u_j,F(u_j)} the goal of supervised learning is to approximate F. Neural networks have shown considerable success in addressing this problem in settings where X is a finite dimensional Euclidean space and where Y is either a finite dimensional Euclidean space (regression) or a set of finite cardinality (classification). Motivated by the need for surrogate modeling in Bayesian inversion, we focus on the design and analysis of algorithms which address supervised learning for settings where U and V comprise spaces of functions; thus F is an operator. The talk describes emerging methodology in this area, emerging theory which underpins the methodology and numerical experiments which elucidate the efficiency of different approaches. Various applications from continuum mechanics are described, including an inverse problem arising in incompressible fluid flow.

Nisheeth Vishnoi

Sampling Under Symmetry

Exponential densities on orbits of Lie groups such as the unitary group are endowed with surprisingly rich mathematical structure and, traditionally, arise in diverse areas of physics, random matrix theory, and statistics.

In this talk, I will discuss how the symmetries of Lie groups can be leveraged to design efficient algorithms for sampling from such distributions leading to new applications in quantum inference and differential privacy.

This talk will require no background in Lie theory and should be broadly accessible.
Hongkai Zhao

How Much Can One Learn a PDE from its Solution Data?

In this work we study a few basic questions for PDE learning from observed solution data. Using various types of PDEs, we show 1) how the approximate dimension (richness) of the data space spanned by all snapshots along a solution trajectory depends on the differential operator and initial data, and 2) identifiability of a differential operator from solution data on  local patches. Then we propose a consistent and sparse local regression method for general PDE identification. Our method requires minimal amount of local measurements in space and time from a single solution trajectory by enforcing global consistency and sparsity.


Register button