Nathan Lambert

Berkeley-trained robotics PhD turned post-training lead at the Allen Institute for AI, whose open-source model work, RLHF book, and Interconnects newsletter have made him one of the most accessible public voices on LLM alignment and post-training.


Profile

Nationality American
Current Institution(s) Allen Institute for AI — Ai2 (Senior Research Scientist, Post-Training Lead)
Research Areas RLHF, Post-Training, Open Language Models, Model-Based Reinforcement Learning, Robotics Control
Doctoral Advisor Kristofer S.J. Pister; Roberto Calandra (co-advisor, Meta AI Research)
Doctoral Thesis Synergy of Prediction and Control in Model-based Reinforcement Learning (UC Berkeley, 2022)
Website natolambert.com
Blog interconnects.ai — 60,000+ subscribers
X / Twitter @natolambert
GitHub natolambert
Google Scholar Nathan Lambert

Overview

Nathan Lambert is an American machine learning researcher whose career has evolved from micro-robotics control at Berkeley into one of the more prominent roles in open-source LLM post-training. As post-training lead at the Allen Institute for AI (Ai2), he has been a driving force behind OLMo — among the first fully open pretrained language models — and the Tülu post-training recipe series, which demonstrated that a small open team could match the instruction-following quality of Meta’s proprietary post-training on the same base model. In parallel, Lambert runs Interconnects, a Substack newsletter that has grown to over 60,000 subscribers and serves as one of the field’s more technically grounded public commentaries on LLM research, policy, and the open vs. closed model debate. He is the sole author of the RLHF Book, a forthcoming print volume from rlhfbook.com that has circulated as a freely available arXiv document and is widely used as a practitioner’s reference.


Early Life & Education

Lambert completed his undergraduate and early graduate work in electrical engineering and computer science. He pursued a PhD at UC Berkeley in the Department of Electrical Engineering and Computer Sciences, working in the Berkeley Autonomous Microsystems Lab under Professor Kristofer Pister and co-advised by Roberto Calandra of Meta AI Research. His dissertation, Synergy of Prediction and Control in Model-based Reinforcement Learning (2022), sits at the intersection of model-based RL and micro-robotics control — an unusual pairing that gave him early experience with both the theoretical underpinnings of RL and the engineering demands of real physical systems. During his PhD he interned at Facebook AI Research and DeepMind, both on model-based RL for control, and received the UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism for efforts to improve community norms and mentor junior students.


Career

UC Berkeley — PhD (2018–2022)

Lambert’s doctoral work addressed the challenge of building sample-efficient learned controllers for micro-robotic platforms, combining model-based prediction with closed-loop control. Internships at Facebook AI Research and DeepMind during this period broadened his scope from hardware-constrained robotics to large-scale RL systems. The experience left him with a dual perspective — an engineer’s respect for grounding claims in physical reality, and a researcher’s appetite for the theoretical machinery of RL — that would later shape his approach to RLHF and post-training.

Hugging Face (2022–2023)

After graduating, Lambert joined Hugging Face, where he helped build the company’s RLHF research function from near-scratch. This role placed him at the centre of the emerging field just as ChatGPT had made RLHF a household acronym. He contributed to open-source tooling and educational resources around reward modelling and preference learning, and began Interconnects as a newsletter to make the rapidly evolving literature accessible to a broader audience. The HuggingFace period established him as a trusted explainer of RLHF mechanics at a moment of maximum public interest.

Allen Institute for AI — Ai2 (2023–present)

Lambert joined Ai2 as a Senior Research Scientist and was named post-training lead. His primary projects have been OLMo, Ai2’s fully open pretrained language model series — released with weights, training data, and training code — and Tülu, the corresponding post-training recipe. Tülu 3 (2024) attracted particular attention for demonstrating that open-recipe post-training could match Meta’s instruction-tuning quality on a shared LLaMA base, a concrete proof-of-concept for the viability of the open model ecosystem. He has described OLMo as the central reason he joined Ai2, viewing full openness — data, code, and weights — as the most tractable lever for making AI more auditable and competitive. He also developed Tülu 3.1, which integrated reinforcement learning with verifiable rewards (RLVR) via Group Relative Policy Optimization (GRPO), scaling up to OLMo 2 32B. In April 2026 he traveled to China to visit most of the leading AI labs — including Moonshot AI, Z.ai, 01.ai, Meituan, and Xiaomi — and published a widely circulated trip report on cultural and organisational differences between Chinese and American research environments.


Key Contributions

  • OLMo (Open Language Model) — Core contributor on Ai2’s flagship fully open pretrained language model series, releasing weights, training data (Dolma), and training code; the most comprehensive open-recipe large language model effort outside of a handful of academic consortia.
  • Tülu / Tülu 3 — Led the post-training recipe that matches Meta’s instruction-following quality using the same LLaMA base, with full reproducibility; Tülu 3.1 further incorporated RLVR/GRPO, enabling OLMo 2 32B to surpass GPT-3.5 Turbo on academic benchmarks as the first fully open model to do so.
  • Interconnects newsletter — Founded and writes a Substack newsletter covering LLM post-training, open-source AI, and the political economy of the field; grown to 60,000+ subscribers and ranked #39 in Technology on Substack, making it one of the more widely read technical ML newsletters.
  • RLHF Book (Reinforcement Learning from Human Feedback, rlhfbook.com / arXiv 2504.12501) — Single-authored book-length treatment of the full RLHF and post-training pipeline, covering instruction tuning, reward modelling, PPO, DPO, RLVR, and open research questions; freely available as a living arXiv document and forthcoming in print.
  • SAIL (Substack Artificial Intelligence Library) — Co-founded readsail.com, a curated reading resource for AI research.
  • Interconnects Interviews — Hosts a podcast series interviewing leading AI researchers on technical trends, complementing the written newsletter.
  • China AI Lab Trip Report (May 2026) — First-person account of visiting the leading Chinese LLM labs (Moonshot, Z.ai, 01.ai, Meituan, Xiaomi, Tsinghua), offering rare first-hand organisational and cultural analysis; widely read across policy and research communities.

Awards & Recognition

  • UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism — Awarded during his PhD for contributions to community norms and mentorship of junior students.
  • Lex Fridman Podcast appearances (February 2025, February 2026) — Invited twice to one of the highest-traffic AI podcasts: first to discuss DeepSeek and its implications for the US–China AI race, and again for a broad survey of the AI state of the art in 2026.
  • Interconnects — #39 in Technology on Substack — Ranking reflects organic subscriber growth driven entirely by technical and analytical content, without institutional backing or promotional spend.

Key Relationships

  • Kristofer S.J. Pister — PhD advisor; pioneer of smart dust and micro-robotics at Berkeley; gave Lambert his grounding in physical systems and hardware-constrained RL.
  • Roberto Calandra — PhD co-advisor from Meta AI Research; bridged Lambert’s micro-robotics work to the large-scale model-based RL literature.
  • Liam Fedus / OpenAI post-training community — Lambert’s work on Tülu directly benchmarks against OpenAI’s post-training work; his newsletter frequently analyses and contextualises OpenAI releases, and he has spoken about the community overlap in post-training methodology.
  • Yann Dubois & HuggingFace RLHF team — Colleagues during the HuggingFace period when open RLHF tooling was being built out.
  • Ai2 / OLMo team — Close collaborators on the full OLMo pipeline; the team deliberately operates at a smaller scale (~10–15 people) than frontier labs, which Lambert has cited as both a constraint and a source of agility.
  • Jordan Schneider (ChinaTalk) — Recurring collaborator and podcast host; Lambert’s China trip was organized in conjunction with the ChinaTalk ecosystem, bridging AI technical analysis with geopolitical framing.

Personal Style

Lambert’s voice is deliberately calibrated against the hype cycles that characterise much AI commentary: he tends to reach for precise technical definitions where others reach for marketing language, and he is openly skeptical of claims that cannot be tested against open benchmarks. His decision to base himself outside San Francisco — notable in a field where proximity to Noe Valley coffee meetings has become almost professionally obligatory — is something he has framed as protecting the independence of his analysis. His writing mixes tutorial-level technical exposition with political-economy commentary on who controls AI infrastructure, a combination rare enough in the field to have built a large cross-disciplinary readership. Outside research he is a competitive mountain runner, and his self-description (“mountain runner, dog dad”) appears in virtually every bio he writes — an unusually personal note in a field where researchers typically lead with affiliations.


References