John Schulman

ref · May 25, 2026, 5:35am

Co-founder of OpenAI and architect of ChatGPT, widely recognized as one of the principal figures behind reinforcement learning from human feedback and modern policy optimization algorithms.

Profile

Field	Detail
Born	1987 or 1988, United States
Nationality	American
Current Institution	Thinking Machines Lab (Chief Scientist, 2025–)
Research Areas	Reinforcement Learning, Policy Optimization, RLHF, AI Alignment
PhD Advisor	Pieter Abbeel
PhD Dissertation	Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs (UC Berkeley, 2016)
Personal Website	joschu.net
X / Twitter	@johnschulman2
GitHub	@joschu

Overview

John Schulman is an American AI researcher best known as a co-founder of OpenAI and the primary architect of ChatGPT’s training methodology. His foundational algorithms — Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) — became the de facto standard for training large language models via Reinforcement Learning from Human Feedback (RLHF), and remain among the most cited works in modern AI. After nearly a decade at OpenAI, where he co-led the post-training team responsible for the GPT model family, he briefly joined Anthropic’s Alignment Science team in 2024 before becoming Chief Scientist at Thinking Machines Lab in early 2025. Schulman occupies a rare position at the intersection of deep theoretical contribution and transformative real-world product impact.

Early Life & Education

Schulman grew up on Long Island, attending Great Neck South High School, where his early interests spanned science, mathematics, and science fiction — particularly the works of Isaac Asimov. In seventh grade, an intense fascination with the television program BattleBots led him to conduct what he described as his first episode of self-directed study, reading broadly across engineering and physics in pursuit of building a superior combat robot, a project that was ultimately never completed. In 2005, he represented the United States as a member of the U.S. Physics Olympiad Team.

B.S., Physics — California Institute of Technology (Caltech), 2010
Schulman completed his undergraduate degree at Caltech, where a series of physics research internships left him more curious about neuroscience and AI than about physics proper.

Initial graduate study, Neuroscience — UC Berkeley
Upon arriving at Berkeley, Schulman enrolled in the neuroscience program and completed several lab rotations. His last rotation was with Professor Pieter Abbeel, whose work on helicopter control and towel-folding robots proved decisive.

Ph.D., Electrical Engineering and Computer Sciences (EECS) — UC Berkeley, 2016
Switching departments after his rotation with Abbeel, Schulman pursued robotics and deep reinforcement learning. His dissertation, Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs, laid the theoretical groundwork for TRPO and the Generalized Advantage Estimation (GAE) framework that followed. Abbeel served as his advisor throughout.

Career

UC Berkeley — Abbeel Lab (2010–2015)

As a PhD student, Schulman’s early work focused on robotic manipulation — trajectory optimization, suturing tasks, and deformable object tracking. His 2013 paper on sequential convex optimization for collision-free trajectories (TrajOpt) won Best Vision Paper at ICRA. Over time, his focus shifted toward policy gradient methods and the theoretical problem of stable, sample-efficient reinforcement learning. This culminated in TRPO (published at ICML 2015), which introduced a principled trust-region update to prevent destructive policy changes, and GAE (ICLR 2016), which provided a variance-reduction framework for advantage estimation.

OpenAI (December 2015–August 2024)

Schulman co-founded OpenAI in December 2015 alongside Sam Altman, Elon Musk, Ilya Sutskever, Greg Brockman, Andrej Karpathy, Wojciech Zaremba, and others, joining before completing his PhD. At OpenAI he led the reinforcement learning research team, which produced a steady stream of foundational work:

PPO (2017): A simplified successor to TRPO using a clipped surrogate objective, PPO became the standard algorithm for large-scale policy optimization, finding widespread adoption in robotics, game-playing, and language model fine-tuning.
OpenAI Gym (2016): Co-authored the benchmark toolkit that standardized RL research environments across the field.
RLHF for language (2017–2022): Schulman identified the potential of Paul Christiano’s early RLHF work on non-language tasks and led its application to large language models, culminating in InstructGPT and ultimately ChatGPT.
ChatGPT (2022): Schulman led the reinforcement learning and post-training teams responsible for ChatGPT, released in November 2022. He has been widely described as the “architect” of ChatGPT. Notably, GPT-4 was already trained before ChatGPT launched; the public reception of ChatGPT nonetheless surprised even the internal team.
Post-training co-lead (2022–2024): From 2022 until his departure, Schulman co-led OpenAI’s post-training team, overseeing the development of models for the ChatGPT product and the OpenAI API.

Anthropic — Alignment Science Team (August 2024–February 2025)

Schulman announced his departure from OpenAI in August 2024, stating his motivation as a desire to deepen his focus on AI alignment and return to more hands-on technical research. He joined Anthropic’s Alignment Science team, working on safety-oriented research. His tenure was brief; by February 2025 he departed to join a new venture.

Thinking Machines Lab (February 2025–present)

Schulman joined Thinking Machines Lab as Chief Scientist shortly after its founding by Mira Murati, former CTO of OpenAI. The startup, which also counts Lilian Weng and (initially) Barret Zoph among its founding team, focuses on advanced AI systems development. His stated research interests at the lab continue to center on reinforcement learning and AI alignment.

Key Contributions

Trust Region Policy Optimization (TRPO) — Published at ICML 2015, TRPO introduced a theoretically grounded constraint on policy updates to prevent instability during training. It became one of the most influential papers in deep RL and directly enabled subsequent work on continuous control and language model fine-tuning.
Proximal Policy Optimization (PPO) — Published in 2017, PPO simplified TRPO’s constrained optimization into a first-order clipped objective that is far easier to implement at scale. It became the dominant RL algorithm in the field, serving as the backbone of RLHF pipelines for InstructGPT, ChatGPT, and most subsequent instruction-tuned models; it has accumulated tens of thousands of citations.
Generalized Advantage Estimation (GAE) — Published at ICLR 2016, GAE provided a unified framework for variance-bias trade-off in policy gradient estimation, widely adopted in RL implementations.
OpenAI Gym — Co-authored in 2016, this standardized benchmark toolkit fundamentally shaped how the RL research community evaluates algorithms, enabling reproducible comparisons across hundreds of environments.
ChatGPT and RLHF at scale — Schulman led the research effort that applied RLHF to GPT-class language models, producing InstructGPT (2022) and then ChatGPT, which demonstrated that alignment techniques could simultaneously improve model helpfulness, safety, and public accessibility.
Concrete Problems in AI Safety — Co-authored with Dario Amodei, Chris Olah, and others in 2016, this paper articulated a taxonomy of safety failure modes (reward hacking, safe exploration, distributional shift) that shaped the early agenda of the AI safety field.
“Let’s Verify Step by Step” (2023) — Co-authored work introducing process reward models (PRMs) for evaluating multi-step reasoning, advancing the field’s understanding of how to supervise chain-of-thought in large language models.
Stochastic Computation Graphs — Published at NeurIPS 2015, this framework unified policy gradients and backpropagation through stochastic nodes, providing the theoretical foundation for his PhD dissertation and for a range of subsequent gradient estimation techniques.

Awards & Recognition

Mark Bingham Award for Excellence in Achievement by Young Alumni (2025) — Awarded by UC Berkeley’s College of Computing, Data Science, and Society; recognizes outstanding early-career alumni achievement.
MIT Technology Review Innovators Under 35 (2018) — Recognized as a pioneer for contributions to deep reinforcement learning and AI research.
ICRA Best Vision Paper (2013) — Awarded for Tracking Deformable Objects with Point Clouds, co-authored with Pieter Abbeel’s group.
U.S. Physics Olympiad Team (2005) — Selected as a member of the national team while still in high school.

Key Relationships

Pieter Abbeel — PhD advisor at UC Berkeley; Abbeel’s robotics lab was the direct catalyst for Schulman’s transition from neuroscience to AI, and their collaboration produced TRPO, GAE, and several robotics papers.
Sam Altman — Co-founder and CEO of OpenAI; Altman served as co-chair at founding and was Schulman’s organizational counterpart as OpenAI grew into a product-focused company.
Ilya Sutskever — Co-founder of OpenAI; a close collaborator on scaling and language model research, including co-authorship on RL² and Concrete Problems in AI Safety.
Paul Christiano — Former OpenAI safety researcher whose early RLHF work on non-language tasks Schulman identified as the seed of the ChatGPT training methodology; now at the Alignment Research Center.
Mira Murati — Former OpenAI CTO, current founder and CEO of Thinking Machines Lab; Schulman joined her startup as Chief Scientist in February 2025.
Andrej Karpathy — Fellow OpenAI co-founder; among Schulman’s most prominent professional peers and followers in the AI research community.
Lilian Weng — Former OpenAI VP of AI Safety; co-founding team member at Thinking Machines Lab alongside Schulman.
Dario Amodei — Co-author of Concrete Problems in AI Safety; founder of Anthropic, the organization Schulman briefly joined in 2024.

Personal Style

Schulman’s research philosophy is characterized by a preference for principled theoretical foundations — most notably the use of trust regions and KL-divergence constraints — applied to problems at the frontier of practical scale. His intellectual trajectory, from physics to neuroscience to robotics to language models, reflects a disposition toward following the most tractable path to understanding intelligence rather than committing to a single methodology. In public appearances he is notably candid about uncertainty, including about ChatGPT’s reception surprising even its creators. Outside research, his stated interests include birdwatching and jazz music, sensibilities that map loosely to a broader appreciation for systems with emergent complexity.