Stein’s Method for Practical Machine Learning

Stein's method, due to Charles M. Stein, is a set of remarkably powerful theoretical techniques for proving approximation and limit theorems in probability theory. It has been mostly known to theoretical statisticians. Recently, however, it has been shown that some of the key ideas from Stein's method can be naturally adopted to solve computational and statistical challenges in practical machine learning. This project aims to harness Stein’s method for practical purposes, with a focus on develop new and efficient practical algorithms for learning, inference and model evaluation of highly complex probabilistic graphical models and deep learning models.

Kernelized Stein Discrepancy

Kernelized Stein discrepancy (KSD), based on combining the classical Stein discrepancy with reproducing kernel Hilbert space (RKHS), allows us to access the compatibility between empirical data and probabilistic distributions, and provides a powerful tool for developing algorithms for model evaluation (goodness-of-fit test), as well as learning and inference in general. Unlike the traditional divergence measures (such as KL, Chi-square divergence), KSD does not require to evaluate the normalization constant of the distribution, and can be applied even for the intractable, unnormalized distributions widely used in modern machine learning.

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation: Liu, Lee, Jordan; ICML, 2016, [A short note], [code: matlab, R ]

[See more details here>>].

Stein Variational Gradient Descent

Based on exploiting an interesting connection between Stein discrepancy and KL divergence, we derive a new form of variational inference algorithm, called Stein variational gradient descent (SVGD), that mixes the advantages of variational inference, Monte Carlo, quasi Monte Carl and gradient descent (for MAP). SVGD provides a new powerful tool for attacking the inference and learning challenges in graphical models and probabilistic deep learning, especially when there is a need for getting diverse outputs to capture the posterior uncertainty in the Bayesian framework.

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm: Liu, Wang; NeurIPS, 2016 [code]

Stein Variational Gradient Descent as Gradient Flow: Liu; NeurIPS, 2017

Stein Variational Gradient Descent as Moment Matching: Liu; NeurIPS, 2018

[See more details here>>].

Slides

Probabilistic Learning and Inference Using Stein's Method [slides, slides]

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation [ICML2016 slides]

Papers

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation: Liu, Lee, Jordan; ICML, 2016 [code: matlab, R ]

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm: Liu, Wang; NeurIPS, 2016 [code]

Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning: Wang, Liu; preprint 2016 [code]

Two methods for Wild Variational Inference: Liu, Feng; preprint, 2016

Black-box Importance Sampling: Liu, Lee; AISTATS, 2017

Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE: Liu, Wang, 2017

Learning to Draw Samples with Amortized Stein Variational Gradient Descent: Feng et al. UAI 2017

Stein Variational Gradient Descent as Gradient Flow: Liu; NeurIPS, 2017

Stein Variational Policy Gradient: Yang et al. UAI. 2017

Stein Variational Gradient Descent as Moment Matching: Liu; NeurIPS, 2018

Stein Variational Gradient Descent Without Gradient: Han, Liu; ICML, 2018

Stein variational gradient descent with matrix-valued kernels: Wang, Tang, Bajaj, Liu, NeurIPS, 2019

Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models: Wang, Liu, ICML, 2019

Learning Self-Imitating Diverse Policies: Gangwani et al. ICLR. 2019

Stein Variational Inference for Discrete Distributions: Han, et al. AISTATS, 2020

Profiling Pareto Front With Multi-Objective Stein Variational Gradient Descent: Liu et al. NeurIPS 2021

Sampling with Trustworthy Constraints: A Variational Gradient Framework: Liu et al. NeurIPS 2021

Sampling in Constrained Domains with Orthogonal-Space Variational Gradient Descent: Zhang et al. NeurIPS 2022

Goodness-of-Fit Testing for Discrete Distributions via Stein Discrepancy: Yang et al. ICML 2018

Stein’s Method Meets Computational Statistics: A Review of Some Recent Developments: Anastasiou et al. 2022

Informal Notes & Misc

A Short Note on Kernelized Stein Discrepancy: Liu, 2016

Stein Variational Gradient Descent: Theory and Applications: Liu, NeurIPS Workshop on Advances in Approximate Bayesian Inference, 2016

Learning to Sample Using Stein Discrepancy: Wang, Feng, Liu, NeurIPS Workshop on Bayesian Deep Learning, 2016