Sébastien Bubeck

ref · May 27, 2026, 7:13am

French-American computer scientist and mathematician who established minimax optimal bounds for multi-armed bandits, proved the law of robustness for neural networks, co-led the Phi small language model series at Microsoft, and authored the “Sparks of AGI” paper on GPT-4.

Profile


Born	April 16, 1985, France
Nationality	French-American
Current Institution(s)	OpenAI (Research Scientist, 2024–present)
Research Areas	Online Learning, Bandits, Convex Optimization, Metrical Task Systems, Deep Learning Theory, Large Language Models, Small Language Models
Doctoral Thesis	Applied Mathematics PhD (INRIA Nord Europe / University of Lille 1, 2010)
Website	sbubeck.com
X / Twitter	@SebastienBubeck
Blog	I’m a Bandit
Google Scholar	Sébastien Bubeck

Overview

Sébastien Bubeck is a French-American mathematician and computer scientist whose career has spanned theoretical machine learning, competitive analysis, and empirical AI. Trained at the École Normale Supérieure de Cachan and INRIA, he built the foundational theory of multi-armed bandits and bandit convex optimization during his PhD and early faculty years at Princeton; then extended theoretical methods to metrical task systems and convex body chasing in a celebrated sequence of papers with Yin Tat Lee and collaborators at Microsoft Research; proved the law of robustness linking neural network overparameterization to Lipschitz regularity; led the team that produced the Phi series of small language models (starting with the “Textbooks Are All You Need” paradigm); and co-authored “Sparks of Artificial General Intelligence: Early Experiments with GPT-4” — a 155-page paper that became one of the most widely read and debated AI documents of 2023. In 2024 he joined OpenAI. His Google Scholar profile reflects more than 25,000 citations. He maintains the blog “I’m a Bandit,” one of the longest-running and most technically rigorous blogs in the machine learning community.

Early Life & Education

Bubeck was born in France in 1985. He entered the École Normale Supérieure de Cachan (ENS Cachan, now ENS Paris-Saclay) in 2005 — one of the most selective grandes écoles in the French system, with a particularly strong mathematics program — and studied there through 2008. In the summer of 2006 he participated in the Research in Industrial Projects for Students (RIPS) program at the Institute for Pure and Applied Mathematics (IPAM) at UCLA.

He began his PhD at INRIA Nord Europe in Lille in 2007, specializing in applied mathematics under the supervision of Jean-Yves Audibert, completing it in 2010. Audibert, a leading researcher at ENPC and INRIA who had developed foundational concentration inequalities and exploration-exploitation methods, was a formative intellectual influence; he died in 2011 at an early age. Bubeck also worked with Rémi Munos at INRIA. During the doctoral period he served as a teaching assistant at the University of Lille 1 (2008–2010). His dissertation was recognized as the best French PhD in Probability/Statistics (Jacques Neveu Prize, 2010), runner-up for the best French PhD in computer science (Gilles Kahn Prize, 2010), and runner-up for the AI thesis award (2011).

Career

Postdoc — Centre de Recerca Matemàtica, Barcelona (2010–2011)

Following his PhD, Bubeck spent one year as a postdoc at the Centre de Recerca Matemàtica in Barcelona, before moving to the United States.

Princeton University — Assistant Professor, ORFE (2011–2014)

Bubeck joined the Department of Operations Research and Financial Engineering at Princeton as an assistant professor. During this period he produced the survey “Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems” (2012, with Nicolò Cesa-Bianchi), which became the primary reference text for the bandit learning community — covering UCB algorithms, Thompson sampling, contextual bandits, and adversarial bandits with a unified theoretical framework. He also wrote the widely-used lecture notes on convex optimization that would later become the Foundations and Trends monograph. He mentored multiple undergraduate researchers and received the Alfred P. Sloan Research Fellowship in Computer Science in 2015. In Fall 2013 he spent a semester as a visiting scientist at the Simons Institute for the Theory of Computing at UC Berkeley.

Microsoft Research — Researcher to Sr. Principal Research Manager (2014–2024)

Bubeck joined Microsoft Research in Redmond in 2014 as a Researcher in the Theory Group, progressing to Senior Researcher (2017–2019), then Senior Principal Research Manager leading the Machine Learning Foundations group (2020–2023), and finally VP AI and Distinguished Scientist (2024).

Bandit and convex optimization (2014–2019). His early Microsoft work resolved several long-standing open problems in online learning. “Kernel-Based Methods for Bandit Convex Optimization” (STOC/JACM 2017, with Ronen Eldan and Yin Tat Lee) gave the first polynomial-time algorithm achieving optimal regret for bandit convex optimization — a problem that had been open for over a decade. This line of work led to a COLT 2016 Best Paper Award.

k-Server and competitive analysis (2018–2019). Together with Michael B. Cohen, Yin Tat Lee, James R. Lee, and Aleksander Madry, Bubeck resolved a major open question in competitive analysis with “K-Server via Multiscale Entropic Regularization” (STOC 2018), achieving the first polylogarithmic competitive ratio for the k-server problem on general metric spaces via a novel entropic regularization technique. The paper received a NeurIPS 2018 Best Paper Award. A companion result, “Competitively Chasing Convex Bodies” (with Yin Tat Lee, Yuanzhi Li, and Mark Sellke), solved the convex body chasing problem optimally. This line of work received the STOC 2023 Best Paper Award, recognizing the lasting influence of the 2018–2019 papers.

Law of Robustness (2021). “A Universal Law of Robustness via Isoperimetry” (NeurIPS 2021 Best Paper, with Mark Sellke) proved a sharp mathematical theorem: a neural network that interpolates n data points with bounded Lipschitz constant must have at least Ω(n) parameters. The result formalizes why overparameterized neural networks can be simultaneously well-fitting and smooth — linking the geometry of parameter counting to isoperimetric inequalities from convex geometry. Quanta Magazine and Nature covered the result as a breakthrough in the theoretical understanding of deep learning. The paper received the NeurIPS 2021 Best Paper Award.

Sparks of AGI: Early Experiments with GPT-4 (2023). In early 2023, as Microsoft gained early access to GPT-4 during its development, Bubeck led a team of fourteen Microsoft Research authors in a 155-page empirical investigation of an early version of the model. The resulting preprint, “Sparks of Artificial General Intelligence,” argued that GPT-4 exhibited qualitatively more general intelligence than prior AI systems, demonstrating remarkable and unexpected competence in mathematics, coding, vision, medicine, law, and many other domains without task-specific prompting. The paper — cautious in its claims but deliberately provocative in framing GPT-4 as an “early (yet still incomplete)” AGI system — became one of the most widely discussed AI documents of 2023, covered by the New York Times, Wired, This American Life, and many other outlets, and contributed to mainstream public understanding of the capabilities of large language models.

Phi: Textbooks Are All You Need (2023). Directly following the Sparks paper, Bubeck and collaborators (primarily Yuanzhi Li and others) pursued the question: could one train a dramatically smaller model to perform comparably on key reasoning tasks, by using high-quality synthetic data rather than scale? “Textbooks Are All You Need” introduced Phi-1 (1.3B parameters), trained on synthetic programming textbooks generated by GPT-4, which achieved 50% on HumanEval despite being orders of magnitude smaller than contemporary state-of-the-art models. This paradigm — curated synthetic “textbook-quality” data and educational-format training — was extended to Phi-1.5 (common-sense reasoning) and Phi-2 (general cognitive tasks), establishing the Phi family of small language models (SLMs) as a prominent line of efficient AI. Bubeck has spoken publicly about the vision of embedding Phi-class SLMs into everyday devices.

OpenAI — Research Scientist (2024–present)

In October 2024, Bloomberg reported that Bubeck would be leaving Microsoft to join OpenAI. He made the move that month, continuing work on small language models, theoretical foundations of AI, and frontier model understanding.

Key Contributions

Minimax Bandit Theory (COLT 2009; survey 2012) — “Minimax Policies for Adversarial and Stochastic Bandits” (with Jean-Yves Audibert) established minimax optimal rates for the multi-armed bandit problem and introduced the UCB-V algorithm. The 2012 survey with Nicolò Cesa-Bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems, became the canonical reference of the bandit literature.
Convex Optimization: Algorithms and Complexity (2015) — A Foundations and Trends in Machine Learning monograph covering gradient descent, mirror descent, accelerated methods, and interior point methods, widely used as a graduate textbook and reference in theoretical machine learning.
Bandit Convex Optimization (STOC 2017) — “Kernel-Based Methods for Bandit Convex Optimization,” with Ronen Eldan and Yin Tat Lee. First polynomial-time algorithm achieving the optimal Õ(√T) regret for bandit convex optimization, resolving a long-standing open problem.
K-Server via Multiscale Entropic Regularization (STOC 2018; NeurIPS 2018 Best Paper) — With Michael B. Cohen, Yin Tat Lee, James R. Lee, and Aleksander Madry. Resolved the decades-old k-server conjecture on general metrics up to polylogarithmic factors, using a novel multiscale entropic regularization approach. STOC 2023 Best Paper Award for long-term impact.
Law of Robustness (NeurIPS 2021 Best Paper) — “A Universal Law of Robustness via Isoperimetry,” with Mark Sellke. Proved that any neural network interpolating n data points with bounded Lipschitz constant requires at least Ω(n) parameters — a mathematically precise formalization of why overparameterized networks can generalize smoothly.
Sparks of Artificial General Intelligence (arXiv 2023) — “Sparks of Artificial General Intelligence: Early Experiments with GPT-4,” with thirteen Microsoft Research co-authors. A 155-page empirical study of early GPT-4 arguing that the model exhibits qualitatively general intelligence across diverse domains. One of the most widely read and cited AI papers of 2023.
Phi / Textbooks Are All You Need (2023) — Led the development of the Phi series of small language models (Phi-1, Phi-1.5, Phi-2, Phi-3) based on the insight that high-quality synthetic textbook-format data enables dramatically smaller models to match or exceed much larger models on key reasoning benchmarks. The Phi-3-mini model achieves GPT-3.5-class performance at 3.8B parameters.

Awards & Recognition

STOC 2023 Best Paper Award — For the k-server/convex body chasing line of work (papers from 2018–2019).
NeurIPS 2021 Best Paper Award — For the law of robustness paper.
NeurIPS 2018 Best Paper Award — For k-server via multiscale entropic regularization.
COLT 2016 Best Paper Award — For optimal bandit convex optimization.
Alfred P. Sloan Research Fellowship in Computer Science (2015) — Awarded by the Sloan Foundation for early-career researchers with exceptional potential.
Best Student Paper Awards — COLT 2009 (minimax bandits); ALT 2018; ALT 2023.
Jacques Neveu Prize (2010) — Best French PhD in Probability/Statistics.
Gilles Kahn Prize (2010) — Second prize, best French PhD in Computer Science.
AI Thesis Award (2011) — Second prize, best French PhD in Artificial Intelligence.

Key Relationships

Yin Tat Lee — The most sustained research collaboration of Bubeck’s career, beginning as an intern at MSR (2015–2016) and continuing across bandit convex optimization, k-server, convex body chasing, and related work. Lee is now a Principal Researcher at MSR.
Ronen Eldan — Long-running collaborator across bandit optimization, law of robustness, and the Sparks of AGI paper; probabilist and computer scientist at the Weizmann Institute and later MSR.
Yuanzhi Li — Intern-turned-collaborator who worked on convex body chasing and then co-led the Phi SLM initiative; Principal Researcher at MSR and formerly assistant professor at CMU.
Mark Sellke — Intern who co-authored the law of robustness paper and convex body chasing results; now at Stanford with Andrea Montanari.
Michael B. Cohen — Exceptionally talented intern who co-authored the k-server paper and several other theoretical works; died in 2017 from undiagnosed Type 1 diabetes at age 20. Bubeck has spoken and written movingly about Cohen’s passing.
Jean-Yves Audibert — PhD advisor; an influential researcher in concentration inequalities and bandit learning who died prematurely in 2011. Audibert’s research approach — mathematically rigorous, probability-theoretic — shaped Bubeck’s foundational orientation.
Nicolò Cesa-Bianchi — Co-author of the canonical bandit survey; the leading figure in online learning with whom Bubeck defined the field’s standard reference.
Aleksander Madry — Collaborator on k-server; MIT professor known for adversarial robustness research.

Personal Style

Bubeck’s research is defined by an unusual trajectory: from foundational probability theory and competitive analysis at one end, through convex geometry, to empirical investigation of the most advanced AI systems at the other. The intellectual thread is a persistent focus on what can be proved — whether that means tight regret bounds for bandit algorithms, sharp parameter-count lower bounds for Lipschitz networks, or rigorous (if non-formal) characterizations of GPT-4’s capabilities. His blog, “I’m a Bandit,” started at Princeton and running for well over a decade, is consistently noted as one of the few technically rigorous personal blogs in theoretical ML, covering open problems, lecture notes, and original commentary in a style that is simultaneously authoritative and accessible. He has been a generous mentor: multiple former interns (Yin Tat Lee, Yuanzhi Li, Mark Sellke) have gone on to become leading researchers in their own right, and he has written publicly about the tragic loss of Michael B. Cohen in 2017 in a way that reveals his investment in the people as well as the mathematics.

References

Personal website: sbubeck.com
Biography: sbubeck.com/bio.html
Awards: sbubeck.com/awards.html
Google Scholar: scholar.google.com
Wikipedia: Sébastien Bubeck
Blog “I’m a Bandit”: blogs.princeton.edu/imabandit
Sparks of AGI paper: arxiv.org/abs/2303.12712
Bloomberg (join OpenAI, October 2024): bloomberg.com
Digg profile: digg.com/u/x/sebastienbubeck