Lawrence Saul

I am a Senior Research Scientist in the Center for Computational Mathematics (CCM) at the Flatiron Institute. I am part of a large and growing research effort in the area of machine learning, both within my own center (ML@CCM) and across all of Flatiron (ML@FI). I work broadly across the areas of high dimensional data analysis, latent variable modeling, variational inference, and representation learning.

Within CCM, I am attempting to build a group with diverse backgrounds and interests. We interview seasonally for summer interns, three-year postdocs, and research scientists. This year we are also advertising a joint position for an associate research scientist in CCM and a tenure-track faculty member in the Computer Science Department at Cooper Union.

Before joining Flatiron, I was a tenured faculty member at UC San Diego and UPenn and a member of the technical staff at AT&T Labs. I also served previously as Editor-in-Chief of JMLR and as Program Chair of NeurIPS. Before my work in machine learning, I earned a bachelor’s degree in Physics from Harvard and a doctorate in Physics from M.I.T.

Recent Projects

Variational inference

Given an intractable distribution p, the problem of variational inference (VI) is to find the best approximation q from some more tractable family. Typically, q is found by minimizing the (reverse) Kullback-Leibler divergence, but in recent papers at ICML and NeurIPS, we have shown how to approximate p by minimizing certain score-based divergences. The first of these papers derives the Batch and Match algorithm for VI with multivariate Gaussian approximations, while the second describes an eigenvalue problem (EigenVI) for approximations based on orthogonal function expansions. In related work, this paper analyzes the inherent trade-offs that arise in VI when a factorized approximation q is used to model a target distribution p that does not factorize.

High dimensional data analysis

Sparse matrices are not generally low rank, and low-rank matrices are not generally sparse. But can one find more subtle connections between these different properties of matrices by looking beyond the canonical decompositions of linear algebra? This paper describes a nonlinear matrix decomposition that can be used to express a sparse nonnegative matrix in terms of a real-valued matrix of significantly lower rank. Arguably the most popular matrix decompositions in machine learning are those—such as principal component analysis, or nonnegative matrix factorization—that have a simple geometric interpretation. This paper gives such an interpretation for these nonlinear decompositions, one that arises naturally in the problem of manifold learning.

Learning with symmetries: weight-balancing flows

Gradient descent is based on discretizing a continuous-time flow, typically one that descends in a regularized loss function. But what if for all but the simplest types of regularizers we have been discretizing the wrong flow? This paper makes two contributions to our understanding of deep learning in feedforward networks with homogeneous activations functions (e.g., ReLU) and rescaling symmetries. The first is to describe a simple procedure for balancing the weights in these networks without changing the end-to-end functions that they compute. The second is to derive a continuous-time dynamics that preserves this balance while descending in the network's loss function. These dynamics reduce to an ordinary gradient flow for l2-norm regularization, but not otherwise. Put another way, this analysis suggests a canonical pairing of alternative flows and regularizers.

Recent papers

Google Scholar

Contact