Susan Zhang

Chinese-American research engineer who built one of the largest reinforcement learning systems at OpenAI for Dota 2, co-led the training infrastructure and open release of OPT-175B at Meta AI, and co-authored LIMA — the paper that showed alignment doesn’t require massive labeled datasets.


Profile

Nationality American (Chinese-American)
Current Institution(s) Google DeepMind (Principal Research Engineer)
Research Areas Large-Scale ML Systems, LLM Training Infrastructure, Reinforcement Learning at Scale, Alignment, Multimodal Language Models
Education BA, Mathematics, Princeton University
Website suchenzang.github.io
X / Twitter @suchenzang
GitHub suchenzang
Google Scholar Susan Zhang

Overview

Susan Zhang is a Chinese-American research engineer and distributed systems specialist currently serving as Principal Research Engineer at Google DeepMind in the San Francisco Bay Area. She is best known for two landmark projects: building one of the largest reinforcement learning training systems ever deployed, which powered OpenAI Five — the agent that defeated professional Dota 2 teams — and co-leading the development and open release of OPT-175B at Meta AI, the first 175-billion-parameter language model released with full weights, training code, and a 114-page operational logbook. The OPT release set an industry precedent for transparency in large language model development and directly influenced subsequent open-source LLM efforts. She also co-authored LIMA (Less Is More for Alignment), which demonstrated that 1,000 carefully selected examples can achieve alignment quality competitive with models trained on orders of magnitude more data. Her self-description — “I specialize in building big systems to crunch through big data and develop big models” — captures her career accurately: she operates at the boundary of systems engineering and research, where the infrastructure for training frontier models is itself a scientific and engineering challenge.


Education & Early Career

Zhang studied mathematics at Princeton University, earning a BA. Before entering AI systems work, she spent time at Los Alamos National Laboratory and in data infrastructure roles across various cloud providers, developing the distributed systems background that would later make her effective at the scale of modern LLM training pipelines. She also spent a period at Unity Games, working in the intersection of gaming and technical infrastructure, before pivoting fully into AI research systems.


Career

OpenAI — RL Systems Engineer (c. 2018–2021)

Zhang joined OpenAI during the development of OpenAI Five, the Dota 2 reinforcement learning agent. She built core components of the training system — one of the largest RL training pipelines in history, running across tens of thousands of CPU cores with asynchronous self-play — that enabled OpenAI Five to reach professional-level play and defeat a world-champion team in a live match in April 2019. The system required solving fundamental problems in distributed RL at a scale far beyond prior work: managing thousands of game environments, synchronizing gradient updates across many parallel workers, and maintaining training stability over months of continuous self-play. The engineering contribution was recognized in the main OpenAI Five technical paper (“Dota 2 with Large Scale Deep Reinforcement Learning,” arXiv 2019) and two companion papers on long-term planning and situational awareness. She gave talks on the OpenAI Five system at the Computer History Museum and Harvard CS50 in early 2022, offering one of the most detailed public accounts of the engineering decisions behind the system.

Meta AI / FAIR — Research Engineer (c. 2021–2022)

Zhang moved to Meta AI’s Fundamental AI Research group, where she served as the primary engineer on the LLM training infrastructure project that produced OPT-175B. The project trained a 175-billion-parameter GPT-3-class decoder-only transformer on 992 80GB A100 GPUs, achieving 147 TFLOP/s utilization per GPU — comparable in quality to GPT-3 at approximately one-seventh the carbon footprint. The training took 56 days on new hardware, with repeated instabilities, hardware failures, and checkpoint rollbacks that required real-time engineering decisions.

The May 2022 release set an industry precedent in three ways. First, it provided the full model weights under a research license — the first time a GPT-3-class model had been made publicly available. Second, it released the entire training codebase as metaseq, an open-source framework for training large transformer language models that subsequently became widely used. Third, and most distinctively, it published a 114-page operational logbook documenting every significant training incident day by day: hardware failures, loss spikes, hyperparameter changes, and engineering mitigations. This level of transparency had no precedent in frontier LLM development and provided the research community with a ground-truth record of what large-scale LLM training actually involves in practice. Zhang presented the OPT work at NeurIPS 2022 (Has It Trained Yet? Workshop), Scale Transform X (2022), Stanford MLSys Seminar (2023), and CMU’s LLM Seminar (2023).

Following OPT, Zhang contributed to the Scaling Laws for Generative Mixed-Modal Language Models paper (2023), a study of how scaling laws extend to models that handle both images and text, and to the CM3 multimodal autoregressive model (“Scaling Autoregressive Multi-Modal Models,” 2023). She also contributed to two theory-adjacent papers characterizing training dynamics: “A Theory on Adam Instability in Large-Scale Machine Learning” and “Effective Theory of Transformers at Initialization,” both in 2023.

LIMA: Less Is More for Alignment (NeurIPS 2023). Zhang co-authored LIMA (with Chunting Zhou, Pengfei Liu, Punit Singh Koura, Weizhu Chen, Graham Neubig, and others), which fine-tuned a 65-billion-parameter LLaMA model on only 1,000 carefully hand-curated examples covering diverse tasks and formats. LIMA matched or exceeded models trained with RLHF on vastly larger datasets on the majority of evaluation dimensions, challenging the prevailing assumption that alignment required large amounts of feedback-annotated training data. The “superficial alignment hypothesis” introduced in the paper — that a model’s knowledge and capabilities are established during pretraining, and fine-tuning primarily adjusts output format and style — became an influential framing in the alignment and RLHF literature.

Luminous Computing (brief interlude, c. 2022–2023)

Zhang briefly worked at Luminous Computing, a photonic computing startup pursuing optical hardware for AI acceleration, in a systems engineering capacity before transitioning to Google DeepMind.

Google DeepMind — Principal Research Engineer (c. 2023–present)

Zhang joined Google DeepMind as a Principal Research Engineer, continuing work at the intersection of large-scale training systems and research. She is based in the San Francisco Bay Area.


Key Contributions

  • OpenAI Five RL Training System (2018–2019) — Co-built the distributed reinforcement learning infrastructure that trained OpenAI Five, one of the most computationally demanding RL pipelines in history. Enabled an agent to defeat professional Dota 2 world champions through thousands of years of self-play experience accumulated over months of real training time. Co-author of the main OpenAI Five paper (arXiv 2019) and companion papers on planning and situational awareness.

  • OPT-175B (arXiv 2022, first author) — Co-led development and open release of Open Pre-trained Transformers (OPT-175B), the first publicly released 175B-parameter language model, achieving GPT-3 performance at one-seventh the carbon cost. The release included full model weights, the metaseq training codebase, and a 114-page day-by-day training logbook — setting the most transparent standard in large language model development at the time.

  • metaseq — Co-developed and released metaseq, Meta’s open-source large-scale language model training framework, which enabled training OPT-175B and has been adopted in subsequent research.

  • LIMA: Less Is More for Alignment (NeurIPS 2023) — Co-authored the demonstration that 1,000 high-quality alignment examples can produce instruction-following quality competitive with large-RLHF models, introducing the “superficial alignment hypothesis” that challenged prevailing assumptions about alignment training data requirements.

  • Scaling Laws for Generative Mixed-Modal Language Models (2023) — Co-authored an empirical study extending scaling law analysis to multimodal language models covering text and image, characterizing how compute-optimal model and data allocation changes across modalities.

  • Adam Instability and Transformer Initialization Theory (2023) — Co-authored two papers providing theoretical and empirical grounding for instability phenomena observed during large-scale training: characterizing Adam optimizer failure modes and deriving effective theories for transformer weight distributions at initialization.


Awards & Recognition

  • NeurIPS 2022 Workshop presentation — OPT-175B presented at the Has It Trained Yet? Workshop at NeurIPS 2022.
  • Stanford MLSys Seminar (2023) — Invited speaker on OPT-175B training infrastructure, one of the most-viewed talks in the MLSys seminar series.
  • Harvard CS50 and Computer History Museum (2022) — Invited speaker on OpenAI Five, reaching broad public audiences with a technical account of large-scale RL systems.

Key Relationships

  • Stephen Roller and Naman Goyal — Equal-contribution co-first authors on the OPT-175B paper alongside Zhang; together represented the core engineering team on the project.
  • Luke Zettlemoyer — Meta AI research lead on the OPT project; provided the academic research direction that shaped the OPT release and the LIMA paper.
  • Myle Ott — Meta AI engineer and co-author on OPT; metaseq’s design owes significantly to the fairseq framework Ott had developed.
  • Christopher Berner, Christy Dennison — OpenAI engineering colleagues on OpenAI Five; Zhang worked within the RL engineering team that Berner and others led.
  • Chunting Zhou and Graham Neubig — Co-authors on LIMA; Neubig (Carnegie Mellon) provided the academic research context for the alignment hypothesis.

Personal Style

Zhang describes her professional identity as being a jack-of-all-trades with deep expertise in a few complementary areas — a stance she has articulated publicly as a deliberate career strategy: staying broadly capable across systems, ML theory, and engineering while accumulating rare combinations of depth over time. Her published work spans RL systems, LLM infrastructure, training theory, and alignment, consistent with this philosophy. The OPT training logbook — which she championed as part of the release — reflects a values commitment to transparency that is unusual in frontier model development: showing not just the final model but every failure, debugging decision, and hardware incident. She maintains a Twitter following of over 44,000 and uses the platform to comment on AI research trends, infrastructure practices, and the dynamics of the tech industry more broadly. Her bio summarizes her career efficiently: “@ Google DeepMind. Past: @MetaAI, @OpenAI, @unitygames, @losalamosnatlab, @Princeton etc. Always hungry for intelligence.”


References