American machine learning researcher who invented diffusion-based generative models in 2015 — the mathematical foundation beneath Stable Diffusion, DALL-E 2, Sora, and virtually every other major generative AI image and video system — after a career that began with navigating Mars rovers and included deep contributions to neural network theory, meta-learning, and LLM evaluation.
Profile
| Nationality | American |
| Current Institution(s) | Anthropic (Member of Technical Staff) |
| Research Areas | Generative Models, Diffusion Models, Statistical Physics, Neural Network Theory, Meta-Learning, Infinite-Width Networks, LLM Evaluation |
| Doctoral Advisor | Bruno Olshausen |
| Doctoral Thesis | PhD, Redwood Center for Theoretical Neuroscience, UC Berkeley (2012) |
| Website | sohldickstein.com |
| Blog | sohl-dickstein.github.io |
| X / Twitter | @jaschasd |
| GitHub | Sohl-Dickstein |
| Google Scholar | Jascha Sohl-Dickstein |
Overview
Jascha Sohl-Dickstein is an American machine learning researcher, currently a Member of the Technical Staff at Anthropic, who is most widely known as the inventor of diffusion-based generative models. The 2015 ICML paper he first-authored — “Deep Unsupervised Learning using Nonequilibrium Thermodynamics” — introduced the mathematical framework of iterative forward noising and learned reverse denoising, borrowed from non-equilibrium statistical physics, that underlies the entire modern family of diffusion models: DDPM, score matching, latent diffusion models, Stable Diffusion, DALL-E 2, Imagen, and Sora. The paper was largely overlooked for five years before Jonathan Ho and colleagues’ DDPM paper (2020) demonstrated its practical potential at scale, after which the entire field retroactively recognized Sohl-Dickstein’s 2015 formulation as the foundational contribution. Beyond diffusion models, he has made significant contributions to the theory of overparameterized and infinite-width neural networks (including co-creating the Neural Tangents library), meta-learning of learned optimizers, and the BIG-Bench benchmark for evaluating large language models. His background is unusually broad for a machine learning researcher: he studied physics and theoretical neuroscience, worked on the Mars Exploration Rover mission at JPL (where he briefly lived on Martian time), spent time as an academic resident at Khan Academy, and held a visiting scholarship in Surya Ganguli’s Stanford laboratory before a decade-long tenure as principal scientist at Google Brain and Google DeepMind.
Early Life & Education
Mars Exploration Rover — JPL (pre-PhD)
Before entering academic machine learning, Sohl-Dickstein worked at NASA’s Jet Propulsion Laboratory on the Mars Exploration Rover mission — the twin rover project that landed Spirit and Opportunity on Mars in January 2004. His role required operating on “Mars time”: because Mars’s day is approximately 24 hours and 37 minutes, mission controllers were required to synchronize their sleep and wake cycles with Martian sunrise and sunset, shifting their Earth schedule by 37 minutes each day. He has described this as the most unusual professional requirement of his career, and it persisted for three months.
PhD — Redwood Center for Theoretical Neuroscience, UC Berkeley (–2012)
Sohl-Dickstein completed his doctoral work at UC Berkeley’s Redwood Center for Theoretical Neuroscience — a center specializing in computational and mathematical models of biological neural systems — under the supervision of Bruno Olshausen. His dissertation drew on the intersection of physics, neuroscience, and machine learning, developing probabilistic models for data distributions with origins in statistical physics. A key early result was “Minimum Probability Flow Learning” (ICML 2011, with Peter Battaglino and Michael DeWeese), which proposed a new parameter estimation method that avoids computing intractable normalization constants or sampling from equilibrium distributions by exploiting non-equilibrium flow dynamics — foreshadowing the thermodynamic intuitions that would later define the diffusion model framework.
Post-PhD: Stanford and Khan Academy
After Berkeley, Sohl-Dickstein held a visiting scholar position in Surya Ganguli’s Theoretical Neuroscience lab at Stanford, working on the intersection of statistical physics and machine learning that would produce the diffusion models paper. He also spent time as an academic resident at Khan Academy, the education nonprofit, reflecting an interest in how machine learning could be applied to educational technology.
Career
Google Brain — Principal Scientist (c. 2015–2023)
Sohl-Dickstein joined Google Brain shortly after the submission of the diffusion models paper in 2015, becoming a principal scientist and remaining for approximately eight years through the Brain-DeepMind merger in 2023. His Google Brain work spanned several parallel research directions.
Diffusion models (2015, recognized 2020–). The ICML 2015 paper “Deep Unsupervised Learning using Nonequilibrium Thermodynamics,” co-authored with Eric Weiss, Niru Maheswaranathan, and Surya Ganguli, introduced the diffusion probabilistic model framework. The core insight, drawn from non-equilibrium thermodynamics, is elegant: a forward Markov chain gradually destroys structure in a data distribution by adding Gaussian noise step by step; a reverse Markov chain, parameterized by a deep neural network, is trained to undo this process, restoring structure from noise. Because each step in the reverse process is a small, well-behaved denoising operation, the model is tractable to train and to sample from despite the high flexibility of the learned distribution. The paper demonstrated the technique on MNIST, CIFAR-10, and other datasets, but received limited citation and community attention for five years. In 2020, Jonathan Ho, Ajay Jain, and Pieter Abbeel published “Denoising Diffusion Probabilistic Models” (DDPM) at NeurIPS, scaling the approach to high-quality image synthesis and explicitly crediting the Sohl-Dickstein 2015 formulation as the foundational prior work. The field’s subsequent explosion — Stable Diffusion, DALL-E 2, Imagen, score-based models, latent diffusion models, Sora — traces directly to the 2015 paper. By 2025, it had accumulated over 20,000 citations, making it one of the most consequential generative modeling papers in history.
Neural network theory — overparameterization and infinite-width networks. Working primarily with Samuel Schoenholz, Sohl-Dickstein developed a series of papers on the theoretical properties of deep neural networks, including “Deep Information Propagation” (ICLR 2017) characterizing the signal propagation and gradient flow properties that determine whether very deep networks can be trained, and a series of papers establishing the correspondence between infinitely-wide neural networks and Gaussian processes (the NNGP correspondence). This work contributed to the formal analysis of neural network function space at infinite width, which was subsequently synthesized in the Neural Tangent Kernel (NTK) framework.
Neural Tangents (2020). Sohl-Dickstein co-created Neural Tangents, an open-source JAX library that enables computing the neural network Gaussian process (NNGP) kernel and the neural tangent kernel (NTK) exactly for finite networks and analytically for infinite-width networks of arbitrary depth and architecture. The library became the standard tool for studying overparameterized neural networks from a kernel and Bayesian perspective, and was widely used in empirical studies of neural network learning dynamics.
BIG-Bench: Beyond the Imitation Game (2022). Sohl-Dickstein was a principal contributor to BIG-Bench, a collaborative benchmark for evaluating language model capabilities and limitations with 444 authors across more than 130 institutions. BIG-Bench assembled 204 tasks designed to probe aspects of reasoning, world knowledge, and language understanding beyond what standard benchmarks could assess, and tested models including GPT-3, GPT-4’s predecessors, PaLM, and others. The resulting paper provided one of the most comprehensive public characterizations of frontier LLM capabilities available at the time of publication and was widely used in subsequent scaling law and capability analysis work.
Learned optimizers (2022). Sohl-Dickstein co-authored “Learned Optimizers that Outperform Adam on Wall-Clock and Test Loss” (2022), demonstrating that meta-trained neural network–based optimizers could outperform Adam on both convergence speed and final model quality across a variety of tasks. The work extended earlier results on meta-learning of optimization algorithms and addressed practical obstacles that had previously prevented learned optimizers from being deployed at scale.
Linear mode connectivity. With Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin, Sohl-Dickstein co-authored “Linear Mode Connectivity and the Lottery Ticket Hypothesis” (ICML 2020), which characterized the geometric structure of neural network loss landscapes and showed that solutions found by stochastic gradient descent are linearly connected in weight space after a short initial training period — with implications for the lottery ticket hypothesis and model stability.
Google DeepMind (2023)
Following the merger of Google Brain and DeepMind in 2023, Sohl-Dickstein held the combined institution’s principal scientist title briefly before departing for Anthropic.
Anthropic — Member of Technical Staff (c. 2024–present)
Sohl-Dickstein joined Anthropic as a member of the technical staff. His personal website, updated as of January 2024, confirms this affiliation. Given Anthropic’s focus on AI safety, interpretability, and frontier model development, his arrival brought the inventor of diffusion models and a decade of large-scale neural network theory to a safety-focused research environment.
Key Contributions
-
Diffusion Probabilistic Models (ICML 2015) — “Deep Unsupervised Learning using Nonequilibrium Thermodynamics,” with Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Introduced the framework of forward noising and learned reverse denoising as a generative modeling paradigm, drawing on non-equilibrium thermodynamics. The foundational paper for the entire family of diffusion models that now powers Stable Diffusion, DALL-E 2, Imagen, and Sora. Accumulated over 20,000 citations. Widely described as Sohl-Dickstein’s “biggest claim to fame” — in his own words.
-
Minimum Probability Flow Learning (ICML 2011) — Developed a parameter estimation method for probabilistic models that avoids computing partition functions by exploiting non-equilibrium dynamics — the first demonstration of Sohl-Dickstein’s distinctive approach of importing statistical physics concepts into machine learning, which later defined the diffusion model contribution.
-
Neural Tangents Library (2020) — Co-created the standard open-source library for computing NNGP and NTK kernels for infinite-width and finite-width neural networks in JAX. Became the reference tool for infinite-width neural network research.
-
Deep Information Propagation (ICLR 2017) — With Samuel Schoenholz. Characterized the mean-field theory of signal propagation in deep networks, providing the first principled framework for analyzing trainability as a function of depth, initialization, and architecture — directly influencing subsequent understanding of gradient explosion and vanishing in very deep networks.
-
BIG-Bench: Beyond the Imitation Game (2022) — Principal contributor to the 444-author collaborative benchmark providing the most comprehensive evaluation of frontier language model capabilities available at the time, across 204 diverse reasoning and knowledge tasks.
-
Learned Optimizers (2022) — Co-authored work demonstrating that meta-trained neural network optimizers can outperform Adam at scale, advancing the practical viability of learned optimization.
Awards & Recognition
- ICML 2015 Best Paper candidate — The diffusion models paper was presented at ICML 2015; its retroactive recognition as one of the most consequential ML papers of the decade is reflected in 20,000+ citations as of 2025.
- NeurIPS 2022 Outstanding Paper (BIG-Bench) — The BIG-Bench paper received outstanding paper recognition.
- Semantic Scholar identification — Consistently identified as among the top-cited researchers in generative modeling.
Key Relationships
- Bruno Olshausen — PhD supervisor at the Redwood Center for Theoretical Neuroscience at UC Berkeley; a pioneer in sparse coding and computational neuroscience who shaped Sohl-Dickstein’s physics-informed approach to machine learning.
- Surya Ganguli — Stanford theoretical neuroscience professor and collaborator; co-author of the original 2015 diffusion models paper; Sohl-Dickstein was a visiting scholar in his lab, and the two share a commitment to connecting statistical physics with machine learning theory.
- Eric Weiss and Niru Maheswaranathan — Co-authors on the original 2015 diffusion paper; both were part of the Ganguli lab or Redwood Center network.
- Jonathan Ho — Lead author of DDPM (2020), which built directly on Sohl-Dickstein’s 2015 paper and brought diffusion models to practical utility at scale; Ho’s paper is the link between Sohl-Dickstein’s foundational contribution and the modern diffusion model ecosystem.
- Samuel Schoenholz — Google Brain colleague and closest collaborator on neural network theory; the deep information propagation and overparameterization theory work was a sustained collaboration.
- Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, Michael Carbin — Co-authors on the linear mode connectivity paper; a cross-institutional collaboration that linked Google Brain to MIT, UToronto, and academic lottery ticket hypothesis research.
Personal Style
Sohl-Dickstein’s intellectual profile is genuinely unusual: a researcher whose most celebrated work draws on 19th-century thermodynamics (Jarzynski’s equality and non-equilibrium statistical mechanics) to solve a 21st-century machine learning problem, and who spent months literally living on Martian time before pursuing a PhD in theoretical neuroscience. His personal website describes his interests as “machine learning, neuroscience, statistical physics, dynamical systems” — and unlike many such lists, this one is accurate. His blog, which he describes as a space for “ideas that are too weird, incomplete, or off-topic to turn into an academic paper,” reflects a preference for sharing half-formed but potentially important ideas over waiting for polished results. His relationship to the diffusion model legacy is characteristically wry: his website says he is “most (in)famous for inventing diffusion models,” and his blog about page calls it his “biggest claim to fame” while noting that the blog itself intends to be about things neither machine learning nor keeping strange hours.
References
- Personal website: sohldickstein.com
- Blog: sohl-dickstein.github.io
- Google Scholar: scholar.google.com
- Original diffusion models paper (arXiv, March 2015): arxiv.org/abs/1503.03585
- Neural Tangents library: github.com/google/neural-tangents
- BIG-Bench: github.com/google/BIG-bench
- Learned Optimizers: arxiv.org/abs/2211.09760
- Digg profile: digg.com/u/x/jaschasd