Percy Liang

Associate Professor of Computer Science at Stanford, founding director of CRFM, the researcher who coined “foundation models,” co-creator of SQuAD and HELM, and creator of Marin — an open science initiative rebuilding foundation model training in public view.


Profile

Field Detail
Nationality American
Current Institution Stanford University — Department of Computer Science (and courtesy, Statistics)
Other Roles Director, Stanford Center for Research on Foundation Models (CRFM); Co-founder, Together AI; Co-founder, Simile AI; Creator, Marin
Research Areas Foundation Models, Natural Language Processing, Semantic Parsing, Machine Learning Theory, Robustness, Evaluation, Reproducibility
PhD Advisors Michael I. Jordan; Dan Klein
PhD Dissertation Computational Linguistics / Probabilistic Models (UC Berkeley, 2011)
Academic Website cs.stanford.edu/~pliang
Marin marin.community
X / Twitter @percyliang
GitHub @percyliang
Google Scholar scholar.google.com

Overview

Percy Liang is an American computer scientist and Associate Professor of Computer Science at Stanford University, where he is the founding director of the Center for Research on Foundation Models (CRFM). He coined the term “foundation models” through the landmark 2021 CRFM paper — which assembled over a hundred Stanford researchers to map the opportunities and risks of large pretrained models — and has since built the field’s primary evaluation framework (HELM), co-created one of NLP’s most influential benchmarks (SQuAD), and launched Marin, an open lab that trains and shares foundation models with full programmatic provenance from raw data to results. He co-founded Together AI, an inference and research platform for open models, and Simile AI. Beyond his technical and institutional contributions, Liang has supervised an exceptionally productive laboratory: his alumni list reads as a roster of the current generation of top ML faculty and research scientists, with graduates holding tenured positions at Berkeley, Princeton, CMU, NYU, USC, Washington, UChicago, Columbia, and ETH Zurich. He describes his research orientation simply: “I am drawn to simple things, want to understand things deeply, and like to build useful systems.”


Early Life & Education

Liang’s competitive record in programming and mathematics at a young age is among the most decorated in this wiki series. He won a silver medal at the International Olympiad in Informatics (IOI) in 2000, and his team placed second at the ACM ICPC World Finals in 2002 — two of the most competitive international programming competitions. He also won the Phoenix Young Musicians Competition in piano in 2000, establishing an early dual identity as both a competitive programmer and a serious musician that has persisted throughout his career. He was later awarded the KDFC Classical Star Search competition (over-21 division, 2008) and the MIT Concerto Competition (2004).

B.S., Computer Science — MIT, 2004
Liang completed his undergraduate degree at MIT.

M.Eng., Computer Science — MIT, 2005
His master’s thesis was supervised by Michael Collins, a leading researcher in statistical NLP.

Ph.D., Computer Science — UC Berkeley, 2011
Liang’s doctoral research was jointly supervised by Michael I. Jordan (the most-cited researcher in machine learning for much of the 2000s-2010s) and Dan Klein (a leading computational linguist and parsing researcher). His dissertation work focused on probabilistic models for natural language processing, combining statistical learning theory with structured prediction — a foundation for his subsequent work on semantic parsing and language grounding.

Postdoc, Google — 2012
After completing his PhD, Liang held a short postdoctoral position at Google before joining Stanford.


Career

Stanford University — Associate Professor (2012–present)

Liang joined Stanford as an assistant professor and was promoted to associate professor. He holds appointments in Computer Science and Statistics, and is affiliated with Stanford HAI, SAIL, and the NLP and ML groups.

Semantic Parsing and Grounding (2012–2018)
Liang’s early Stanford research focused on semantic parsing — teaching machines to translate natural language into formal programs that can be executed against structured databases or environments. His group developed methods for learning semantic parsers from distant, noisy supervision (executing programs rather than annotating logical forms), and for compositional generalization. This work produced several highly cited papers at ACL, EMNLP, and ICML and trained the first generation of his PhD students.

SQuAD — Stanford Question Answering Dataset (2016)
With PhD student Pranav Rajpurkar and others, Liang co-created SQuAD, a reading comprehension benchmark built from Wikipedia passages. The original SQuAD dataset (2016) and its adversarial extension SQuAD 2.0 (2018, with Robin Jia) became among the most-used benchmarks in NLP history — the 2016 paper has accumulated tens of thousands of citations, making it one of the most-cited papers in the field. SQuAD’s combination of scale, format, and clear evaluation protocol established the template for subsequent NLP benchmarks and catalyzed a wave of machine reading comprehension research.

Adversarial Examples for Reading Comprehension (2017)
With Robin Jia, Liang demonstrated that adding distracting sentences to SQuAD passages caused a dramatic drop in model performance — a finding that predated and helped frame the subsequent field-wide focus on robustness and distribution shift in NLP.

Influence Functions for Machine Learning (2017)
Co-authored with Pang Wei Koh (and later expanded), this work applied classical statistical influence functions to neural networks to identify which training examples most influenced a given prediction — a foundational tool for data attribution, model debugging, and understanding training dynamics.

CodaLab Worksheets
Liang has been an early and sustained advocate for reproducible research, developing and maintaining CodaLab Worksheets — a platform for managing computational experiments with full provenance, allowing papers to be “executable” in the sense that all code and data to reproduce results is linked directly to the publication.

CS336: Language Models from Scratch
Liang created CS336, Stanford’s course on building language models from first principles — covering data, architecture, training, and evaluation at the scale of actual language model development. The course is distinctive for requiring students to implement every component themselves and has been influential in establishing what graduate-level language model education looks like.

CS324 / CS221 / CS229T
Liang also teaches CS324 (Advances in Foundation Models), CS221 (Artificial Intelligence), and CS229T (Statistical Learning Theory), covering both cutting-edge applied and theoretical dimensions of AI.

Coining “Foundation Models” and Founding CRFM (2021)

In August 2021, Liang launched the Stanford Center for Research on Foundation Models (CRFM) as an initiative within Stanford HAI, and co-led the landmark report “On the Opportunities and Risks of Foundation Models” — a 200-page analysis involving over a hundred Stanford researchers covering the technical, social, legal, and ethical dimensions of large pretrained models. The paper coined the term “foundation models” to describe models trained on broad data at scale and adaptable to a wide range of tasks. Liang later explained the naming: he and his colleagues “went around Stanford University and saw who was interested in this phenomenon” and needed terminology that was descriptive without being evaluatively loaded in either direction.

The term gained rapid global adoption and replaced or competed with prior terminology (“large language models,” “pretrained models,” “base models”) across industry, government, and academic discourse. The CRFM report became one of the most-cited AI papers of 2021 and shaped the framing of AI policy discussions in the US, EU, and UK.

HELM — Holistic Evaluation of Language Models (2022)

With CRFM colleagues, Liang developed HELM (Holistic Evaluation of Language Models), a benchmarking framework built on three principles: a systematic taxonomy of evaluation scenarios rather than an ad hoc list, evaluation of all models on all scenarios for fair cross-model comparison, and measurement of multiple metrics beyond accuracy (including calibration, robustness, fairness, and efficiency). The initial release evaluated over thirty models from thirteen organizations including AI21, Anthropic, Cohere, Google, Meta, Microsoft, NVIDIA, and OpenAI — the most comprehensive public comparative evaluation of language models to that point. HELM has since expanded to include vision-language models, code, and reasoning benchmarks, and is hosted at crfm.stanford.edu/helm as a continuously updated public resource.

Generative Agents (2023)

With PhD student Joon Sung Park and Michael Bernstein, Liang co-authored “Generative Agents: Interactive Simulacra of Human Behavior” — a paper that demonstrated LLM-powered agents embedded in a virtual town could exhibit plausible, emergent social behaviors: forming memories, making plans, spreading information, and organizing events. One of the most-cited NLP papers of 2023, it established the genre of “LLM society simulation” research and influenced both AI research and game design communities.

Prefix Tuning (2021)

With Xiang Lisa Li, Liang co-authored Prefix Tuning, a method for parameter-efficient fine-tuning of language models by prepending trainable continuous embeddings to the input context rather than updating all model weights — achieving comparable results to full fine-tuning with a fraction of the parameter updates.

Together AI — Co-Founder (2022)

Liang co-founded Together AI (together.ai), a platform providing inference APIs and research infrastructure for open-source language models, enabling access to models including LLaMA variants and custom trained models at competitive cost. Together AI has been used by a large fraction of the ML research community for experiments requiring access to large open models.

Simile AI — Co-Founder

Liang co-founded Simile AI (simile.ai), a more recent venture whose focus area builds on the generative agent and AI-for-science research directions.

Marin — Open Science Foundation Model Lab (2025–present)

In May 2025, Liang launched Marin (marin.community), an open lab that trains foundation models from scratch with full public documentation of every step: the code, data, experiments, mistakes, and results are all shared in real time via GitHub issues (for experiment preregistration), pull requests (for code), and Weights & Biases reports (for results). Marin-8B-Base beat LLaMA 3.1 8B on 14/19 standard benchmarks; Marin-32B-Base became the best open-source model as of October 2025 on the same benchmark set. Marin’s speedrun leaderboard competition invites researchers to find faster training methods at a given compute budget — inspired by the nanogpt speedrun — with free compute offered to top performers. The project runs on Google TPU Research Cloud resources and is explicitly organized as a community research effort.


Key Contributions

  • Coined “foundation models” — The 2021 CRFM paper introduced and defined the term; it became the dominant vocabulary across AI research, policy, and industry for describing large pretrained adaptable models.

  • SQuAD / SQuAD 2.0 — Co-created the Stanford Question Answering Dataset (2016) and its adversarial extension (2018); among the most-cited NLP papers ever written; shaped machine reading comprehension as a subfield and drove a wave of benchmarking work in NLP.

  • HELM — Built the most comprehensive comparative evaluation framework for language models, evaluated across dozens of models and hundreds of scenarios; established transparency and multidimensional evaluation as standards for responsible model comparison.

  • Generative Agents — Demonstrated emergent social behavior in LLM-powered virtual societies; one of the most-cited AI papers of 2023; defined a new research direction in AI simulation.

  • Prefix Tuning — A parameter-efficient fine-tuning method that became a widely used technique for adapting large language models without full fine-tuning.

  • Influence Functions for ML — Foundational work on attributing model predictions back to training data; widely used for understanding model behavior and data debugging.

  • Marin — A novel model for open AI science: training foundation models in public with full programmatic documentation, treating science itself as a community process rather than a sequence of published results.

  • Adversarial SQuAD — Showed that adding distractors to reading comprehension passages caused catastrophic model failure; one of the earliest demonstrations of the fragility of NLP systems under natural distribution shift.

  • CodaLab Worksheets — Open platform for reproducible ML experiments; reflects Liang’s sustained institutional investment in research infrastructure.

  • CRFM — Founded and directs Stanford’s Center for Research on Foundation Models, which has produced influential research, evaluation frameworks, and policy engagement around AI.


Awards & Recognition

  • Presidential Early Career Award for Scientists and Engineers (PECASE) (2019)
  • IJCAI Computers and Thought Award (2016) — Biennial prize for outstanding AI contributions by a researcher under 35; among the most prestigious early-career AI awards.
  • NSF CAREER Award (2016)
  • Sloan Research Fellowship (2015)
  • Microsoft Research Faculty Fellowship (2014)
  • AI2050 Schmidt Sciences Fellow (ongoing) — Fellowship for high-impact AI research.
  • ACM ICPC World Finals — 2nd Place (2002) — With MIT team.
  • International Olympiad in Informatics — Silver Medal (2000)
  • Paper awards at ACL, EMNLP, ICML, COLT, ISMIR, CHI, UIST, and R (multiple).
  • Graduate fellowships: NSF, NDSEG, GAANN, Siebel Scholar.
  • Piano: KDFC Classical Star Search winner (over-21 division, 2008); MIT Concerto Competition (2004); Phoenix Young Musicians Competition (2000).

Academic Lineage

Liang has supervised an unusually large and well-placed group of PhD students and postdocs. Among the most prominent alumni:

  • Jacob Steinhardt — PhD 2018; now associate professor at UC Berkeley and founder of Transluce; leading researcher in robustness, AI safety, and evaluation.
  • Pranav Rajpurkar — PhD 2021 (co-advised with Andrew Ng); now associate professor at Harvard; created SQuAD and has led AI-for-medicine research.
  • Aditi Raghunathan — PhD 2021; now assistant professor at CMU; leading researcher in robust ML.
  • Pang Wei Koh — PhD 2022; now assistant professor at University of Washington; influential work on influence functions and distribution shift.
  • Rishi Bommasani — PhD 2025 (co-advised with Dan Jurafsky); authored the “On the Opportunities and Risks of Foundation Models” paper and led much of HELM; now senior research scholar at Stanford HAI.
  • Joon Sung Park — PhD 2025 (co-advised with Michael Bernstein); first author of Generative Agents; now founder of a startup.
  • Tatsunori Hashimoto — Postdoc; now assistant professor at Stanford; has co-supervised multiple Liang students.
  • Robin Jia — PhD 2020; now assistant professor at USC; created Adversarial SQuAD.
  • Mina Lee — PhD 2023; now assistant professor at University of Chicago.
  • Yuhuai (Tony) Wu — Postdoc; co-founder of xAI (Elon Musk’s AI company).

Key Relationships

  • Michael I. Jordan — PhD advisor at Berkeley; one of the most influential ML researchers of his generation; Jordan’s emphasis on statistical rigor and principled uncertainty quantification is visible in Liang’s research throughout his career.
  • Dan Klein — PhD co-advisor; leading computational linguist; shaped Liang’s grounding in NLP and structured prediction.
  • Michael Collins — MEng advisor at MIT; one of the pioneers of statistical NLP; earliest influence on Liang’s approach to language.
  • Fei-Fei Li — Co-founder of Stanford HAI and close colleague; their institutional work at Stanford HAI and CRFM overlaps significantly in the governance and evaluation of foundation models.
  • Christopher Manning — Senior Stanford NLP colleague; multiple Liang students co-advised with Manning; their shared position within the Stanford NLP ecosystem gives their relationship particular institutional weight.
  • Chelsea Finn — Current Stanford colleague; Liang co-advises students with Finn, reflecting an overlap between his language modeling work and her robotics perspective.

Personal Style

Liang’s research and institutional work share a consistent orientation toward clarity, rigor, and public accountability. The phrase he uses to describe his research ethos — “I am drawn to simple things, want to understand things deeply, and like to build useful systems” — applies equally to his approach to education (CS336 requires students to implement everything from scratch), evaluation (HELM attempts to be exhaustive and principled rather than selective), and open science (Marin documents every experiment including mistakes). His sustained investment in infrastructure — CodaLab, HELM, Marin — reflects a conviction that the cumulative reliability of science depends on how work is recorded and shared, not just what conclusions are reached. His co-founding of Together AI and Simile AI alongside his faculty role represents a pattern in his career of building institutions and platforms rather than limiting himself to research outputs. He has described the open science principles behind Marin as a direct response to a reproducibility crisis he sees in large-scale AI research: if the experiments that train frontier models are not documented in real time, the field loses the ability to build cumulatively on its own work.


References