Inventor of DDIM and former Chief Scientist at Luma AI, whose work on accelerated diffusion sampling helped transform diffusion models from an academic curiosity into the engine of the generative AI industry.
| Born | c. 1994, China |
| Nationality | Chinese |
| Current Institution(s) | Independent (as of June 2026); formerly Luma AI (Chief Scientist) |
| Research Areas | Diffusion Models, Score-based Generative Models, Video & Multimodal Generation, Bayesian Optimization, Reinforcement Learning, Imitation Learning |
| Doctoral Advisor | Stefano Ermon |
| Doctoral Thesis | Compression, Generation, and Inference via Supervised Learning (Stanford University, 2021) |
| Website | tsong.me |
| X / Twitter | @baaadas |
| GitHub | jiamings |
| Google Scholar | Jiaming Song — 35,900+ citations |
Overview
Jiaming Song is a Chinese generative AI researcher best known for creating DDIM (Denoising Diffusion Implicit Models), the accelerated sampler that made diffusion models computationally feasible at production scale and became a standard component in systems including Stable Diffusion, DALL·E 2, and Imagen. Trained at Tsinghua University and Stanford, where he worked under Stefano Ermon, Song earned a reputation for combining deep probabilistic theory with high-impact engineering insight. After a postdoctoral period at Stanford and a stint at NVIDIA Research, he joined Luma AI as Chief Scientist, where he led the research team through three successive product pivots—from 3D reconstruction to video generation to unified multimodal modeling—culminating in the Dream Machine (Ray) video model and the Uni-1 multimodal reasoning system. He departed Luma AI in mid-2026, with his next direction undisclosed.
Early Life & Education
Song received his undergraduate education at Tsinghua University, completing a Bachelor of Engineering in Computer Science and Technology between 2012 and 2016. He graduated with Outstanding Honor (Top 1% of his class), receiving the Zhong Shimo Scholarship—the highest merit award in the CS department (Top 0.75%)—as well as the Google Excellence Scholarship (awarded to 58 students across China) and the Qualcomm Scholarship for exceptional research. His early recognition extended to competition mathematics and computing: he earned a Bronze Prize at the National Olympiad in Informatics in 2011, and placed as an Outstanding Winner (Top 0.3%) in the Interdisciplinary Contest in Modeling in 2015. During his undergraduate years he was a visiting researcher at Duke University’s Information Initiative (summer 2015), where he worked on temporal sigmoid belief networks, signaling an early orientation toward probabilistic generative models.
In September 2016 Song enrolled at Stanford University for doctoral study in Computer Science, joining the research group of Professor Stefano Ermon in the Stanford AI Lab. His dissertation, Compression, Generation, and Inference via Supervised Learning, developed a unified framework for learning complex distributions without requiring explicit normalization, threading together score-based generative modeling, implicit probabilistic models, and their applications to inverse problems. During his PhD he interned at OpenAI (summer 2017), where he worked on interpretable skill abstraction from language, and at Facebook AI Research (summer 2018), contributing to large-scale object counting from satellite imagery. He completed his PhD in September 2021 and remained at Stanford as a Postdoctoral Scholar under Ermon for one additional year (through June 2022).
Career
Stanford University, Ermon Group (2016–2022)
Song’s most consequential graduate-level contribution came in October 2020 with the publication of Denoising Diffusion Implicit Models (DDIM) on arXiv, co-authored with Chenlin Meng and Stefano Ermon, and presented at ICLR 2021. At the time, Denoising Diffusion Probabilistic Models (DDPMs) required simulating a Markov chain of 1,000 or more steps to generate a single image, making them impractical for most production deployments. Song’s key insight was that the training objective of DDPMs is compatible with a broader family of non-Markovian diffusion processes whose reverse steps can be solved with far fewer iterations. DDIM reduced the required sampling steps by up to 50× while preserving image quality and introducing a new capability: deterministic sampling that enables semantic interpolation in the latent space. The paper became one of the most cited works in the history of generative AI, and the DDIM sampler was integrated almost universally into downstream systems including Stable Diffusion, DALL·E 2, Imagen, and Midjourney.
Other significant PhD-era contributions include SDEdit (ICLR 2022), an image synthesis and editing method based on diffusion model priors that enabled stroke-guided image generation without adversarial training; DDRM (Denoising Diffusion Restoration Models, NeurIPS 2022), extending diffusion models to general linear inverse problems including super-resolution, deblurring, and inpainting; and D2C (Diffusion-Denoising Models for Few-shot Conditional Generation, NeurIPS 2021). His ICLR 2022 Outstanding Paper Award was earned for a separate line of work—“Comparing Distributions by Measuring Differences that Affect Decision Making”—demonstrating range beyond pure generative modeling.
As a postdoc (2021–2022) Song continued publishing at the intersection of Bayesian optimization and generative models, including “A General Recipe for Likelihood-free Bayesian Optimization” (ICML 2022 Long Oral, Top 2.2%).
NVIDIA Research (June 2022 – c. 2023)
Song joined NVIDIA Research as a Research Scientist, focusing on diffusion models for multimodal generation and foundation model research. There he co-authored eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers (TMLR 2023), which demonstrated that different stages of the diffusion sampling process benefit from specialized model expertise and proposed a practical mixture-of-denoisers framework for high-resolution text-to-image synthesis.
Luma AI, Chief Scientist (c. 2023 – June 2026)
Song joined Luma AI as Chief Scientist as the company was pivoting from its origins in neural radiance field (NeRF) based 3D reconstruction toward generative video and multimodal AI. He led research across the full modeling stack—architecture, training infrastructure, and data pipelines—through three successive product phases.
Genie was Luma’s 3D-generation line, applying diffusion-based techniques to controllable object and scene synthesis. Song led the transition from this foundation into video generation.
Ray / Dream Machine (launched publicly in June 2024) is Luma’s video generation model family, focused on temporal coherence, camera-aware motion, and creative control from text or image prompts. Dream Machine attracted over one million users within four days of release. The model established Luma AI as a leading player in the AI video generation space alongside Sora (OpenAI), Gen-3 (Runway), and Kling (Kuaishou). For this work, Song was named to the MIT Technology Review Innovators Under 35 list in 2024.
Uni-1 (released 2025) is Luma’s unified multimodal reasoning model for image generation and editing, built around intention understanding, spatial reasoning, reference-guided generation, and culturally aware visual creation—representing Luma’s move toward agentic, instruction-following multimodal AI.
Alongside his product work, Song continued publishing on fundamental generative modeling problems. In early 2025 he co-authored “Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms” (with Linqi Zhou), arguing against the false dichotomy between autoregression and diffusion and proposing that flow maps can enable inference-time computation to improve the quality of generative pre-training—a line of thinking he also elaborated in his blog post “Inference-Time Scaling for Generative Pre-Training.” He co-authored “Inductive Moment Matching” (with Linqi Zhou and Stefano Ermon) and “Terminal Velocity Matching,” both advancing the theory of efficient generative model training.
Song confirmed his departure from Luma AI in June 2026. His personal website describes him as building “multimodal AI systems for general intelligence,” and his next venture or role had not been publicly announced as of the time of writing.
Key Contributions
- DDIM (Denoising Diffusion Implicit Models) — Introduced a non-Markovian class of diffusion processes that reuse existing DDPM training while enabling 10–50× faster sampling and deterministic latent interpolation; adopted nearly universally in production image generation systems including Stable Diffusion, DALL·E 2, Imagen, and Midjourney. The paper accrued 35,900+ total citations across Song’s scholar profile; DDIM alone is among the most cited papers in modern deep learning.
- SDEdit: Guided Image Synthesis via Stochastic Differential Equations (ICLR 2022) — Enabled stroke-guided and reference-guided image editing through diffusion priors without adversarial training or task-specific models, opening a line of controllable generation research with widespread downstream influence.
- DDRM: Denoising Diffusion Restoration Models (NeurIPS 2022) — Extended the diffusion framework to the family of linear inverse problems (deblurring, super-resolution, inpainting), outperforming prior unsupervised methods in reconstruction quality and perceptual fidelity at 5× the speed.
- eDiff-I (TMLR 2023) — Proposed an ensemble-of-expert-denoisers architecture for text-to-image generation, demonstrating that different denoising timesteps benefit from specialized networks; contributed to NVIDIA’s generative AI roadmap.
- Dream Machine / Ray (Luma AI, 2024) — Led research on a video generation model widely adopted by creators and recognized as a step-change in camera-consistent, physically plausible AI video; reached 1M+ users in four days of launch.
- Uni-1 (Luma AI, 2025) — Led development of a unified multimodal model combining image understanding, generation, and editing under a single architecture guided by natural-language intent.
- Inference-time scaling for generative pre-training (2025) — An emerging research direction arguing that inference-time compute can systematically improve diffusion and flow-based pre-training, with implications analogous to chain-of-thought scaling in language models.
Awards & Recognition
- MIT Technology Review Innovators Under 35 — Asia Pacific (2024) — Recognized for leading development of Dream Machine and breakthrough contributions to large-scale AI video generation.
- ICLR 2022 Outstanding Paper Award — For “Comparing Distributions by Measuring Differences that Affect Decision Making,” one of the top recognized papers at the International Conference on Learning Representations.
- ICML 2022 Long Oral presentation (Top 2.2%) — For “A General Recipe for Likelihood-free Bayesian Optimization.”
- Qualcomm Innovation Fellowship (2018) — One of eight recipients nationally for the project “Safe Multi-Agent Imitation Learning for Self-Driving.”
- Qualcomm Scholarship, Tsinghua University (2016) — Awarded to the top 1% of Tsinghua undergraduates for exceptional research output.
- Google Excellence Scholarship (2015) — Awarded to 58 undergraduate and graduate students across China for academic and research distinction.
- Outstanding Winner, Interdisciplinary Contest in Modeling (2015) — Top 0.3% globally.
- Outstanding Undergraduate, China Computer Federation (2014) — One of two recipients at Tsinghua.
- Zhong Shimo Scholarship, Tsinghua CS Department (2013) — Highest departmental scholarship, Top 0.75%.
- Bronze Prize, National Olympiad in Informatics (2011) — National-level recognition in competitive programming.
Key Relationships
- Stefano Ermon — PhD and postdoctoral advisor at Stanford; Professor of Computer Science and leader of the Stanford AI Lab’s probabilistic modeling group. Ermon’s foundational work on score-based generative models directly enabled DDIM, and the pair have continued to co-author throughout Song’s career including the 2025 Inductive Moment Matching paper.
- Chenlin Meng — Closest PhD-era collaborator and co-first-author of both DDIM and SDEdit; now a researcher at Stanford and independent startup founder. The two were the primary driving force behind several of the most influential papers to emerge from the Ermon group.
- Yang Song — Overlapping researcher in the Ermon group whose work on score-based generative models via SDEs (ICLR 2021 Best Paper) formed the continuous-time theoretical complement to Jiaming Song’s DDIM; the two worked in parallel on what became the dual foundation of the modern diffusion model literature.
- Linqi Zhou — Frequent recent collaborator (Terminal Velocity Matching, Inductive Moment Matching, inference-time scaling); a former Luma AI colleague and continuing research partner post-departure.
- Ambrish Rawat / Luma AI team — Collaborated across the Genie → Ray → Uni-1 product pivots; Song’s research leadership at Luma was complemented by a tight engineering team enabling model-to-product translation.
Personal Style
Song occupies a rare position in the generative AI landscape as someone who has made genuinely foundational theoretical contributions—DDIM rewrites the mathematics of diffusion sampling, not just its implementation—while also demonstrating the product instincts to guide a company through multiple complete strategy pivots. His published writing, including a March 2025 blog post on inference-time scaling, is notable for its willingness to challenge consensus framing: he argues that the opposition between autoregressive and diffusion approaches is a false dichotomy, and that flow-based objectives open new theoretical territory for pre-training. His X/Twitter presence under the handle @baaadas is sparse but pointed, consistent with a researcher who prefers to speak through work. The throughline from his Tsinghua competition prizes, through the spare elegance of the DDIM derivation, to his recent theoretical papers on moment matching and velocity matching suggests a persistent preference for finding the cleanest mathematical structure underneath a seemingly complex problem.
References
- Personal website: tsong.me
- Hello.cv resume: hello.cv/quchao-1
- Google Scholar: scholar.google.com/citations?user=6dP660cAAAAJ
- DBLP: dblp.org/pid/173/5104.html
- Semantic Scholar: semanticscholar.org/author/Jiaming-Song/51453887
- Stanford dissertation: purl.stanford.edu/zy983tp3399 (DBLP record for Compression, Generation, and Inference via Supervised Learning)
- DDIM paper (ICLR 2021): arxiv.org/abs/2010.02502
- MIT Technology Review Innovators Under 35 (2024): innovatorsunder35.com/the-list/jiaming-song
- Luma AI Uni-1: lumalabs.ai/uni-1
- Luma AI Ray: lumalabs.ai/ray
- Blog: “Inference-Time Scaling for Generative Pre-Training”: tsong.me/blog/inference-time-scaling