Pieter Abbeel

ref · May 25, 2026, 7:42am

Professor of EECS at UC Berkeley, co-founder of Covariant and Gradescope, recipient of the 2021 ACM Prize in Computing, and one of the most consequential PhD advisors in modern AI — whose students co-founded OpenAI, Physical Intelligence, Perplexity, and a dozen other companies.

Profile

Field	Detail
Born	1977, Antwerp, Belgium
Nationality	Belgian-American
Current Roles	Professor, UC Berkeley EECS; Amazon Scholar (AGI / LLM); Co-director, BAIR
Research Areas	Robot Learning, Deep Reinforcement Learning, Apprenticeship Learning, Imitation Learning, Foundation Models for Robotics
PhD Advisor	Andrew Ng (Stanford)
PhD Dissertation	Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control (Stanford, 2008)
Academic Website	people.eecs.berkeley.edu/~pabbeel
X / Twitter	@pabbeel
GitHub	@pabbeel
Google Scholar	scholar.google.com

Overview

Pieter Abbeel is a Belgian-American professor of Electrical Engineering and Computer Sciences at UC Berkeley, co-director of the Berkeley Artificial Intelligence Research (BAIR) Lab, and one of the most influential figures in robot learning over the past two decades. He received the 2021 ACM Prize in Computing — one of the field’s most prestigious early-to-mid-career honors — for pioneering apprenticeship learning and deep reinforcement learning for robotic control. His methodological contributions span both the classical and deep learning eras of RL: from apprenticeship learning via inverse RL (2004) through TRPO (2015), MAML (2017), Soft Actor-Critic, Domain Randomization, and Hindsight Experience Replay. He co-founded Gradescope (acquired by Turnitin, 2018) and Covariant, a robotics foundation model company whose technology was acquired by Amazon in 2024; in December 2025 he was appointed head of Amazon’s LLM efforts within its AGI organization. His most lasting institutional contribution may be his student lineage: alumni of his lab have co-founded OpenAI, Physical Intelligence, Perplexity, Skild, Ideogram, Genmo, and more than a dozen other AI companies.

Early Life & Education

Abbeel was born in Antwerp in 1977 and grew up in the nearby suburb of Brasschaat. As a high school student at Sint-Michielscollege, he played on the club basketball team — a sport he continued at university level. He has cited an early recognition that AI could serve as a universal tool across disciplines, and that intelligence is what distinctively separates humans from other species, as the motivation for entering the field.

B.S. and M.S., Electrical Engineering — KU Leuven, Belgium, 2000
Abbeel completed both degrees at KU Leuven, one of Belgium’s leading research universities, playing on the university basketball team throughout.

Ph.D., Computer Science — Stanford University, 2008
Abbeel was the first doctoral student of Andrew Ng, who was himself a first-year professor at Stanford when Abbeel joined. His dissertation, Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control, established the theoretical and empirical foundations of learning from demonstration — particularly the framework of inferring reward functions from expert behavior through inverse reinforcement learning. The dissertation demonstrated that a helicopter could be trained to match the aerobatic skill level of expert human pilots by observing their flight rather than being hand-programmed with the requisite control rules. He originally intended only a master’s degree but stayed for the PhD due to the concentration of AI projects at Stanford under Ng.

Career

UC Berkeley — Professor (2008–present)

Abbeel joined Berkeley as an assistant professor in EECS in 2008, founded the Berkeley Robot Learning Lab immediately upon arrival, and was promoted to full professor with tenure in 2017. In 2016 he became co-director of BAIR.

Apprenticeship Learning and Robotic Manipulation (2008–2015)
Abbeel’s Berkeley group extended the helicopter apprenticeship learning results to a wide range of manipulation tasks. The most publicly celebrated was cloth and laundry folding — demonstrating that robots could perceive and manipulate deformable objects by combining novel visual perception, physics-based tracking, and learning from demonstration. The laundry-folding robot became an iconic image in popular science coverage of robot learning, cited extensively in press coverage including the BBC, New York Times, and Rolling Stone. Other early Berkeley results included surgical suturing and knot-tying.

OpenAI — Concurrent Research Role (2016)
During a period of overlap with his Berkeley faculty position, Abbeel was affiliated with OpenAI and co-authored research on reinforcement learning and control with John Schulman and other OpenAI researchers. His group’s deep RL contributions from this era, particularly TRPO and GAE, were developed at the intersection of his Berkeley lab and the early OpenAI environment.

TRPO and GAE (2015)
Trust Region Policy Optimization (TRPO), co-authored with John Schulman, Sergey Levine, Philipp Moritz, and Michael Jordan, introduced a theoretically principled constraint on policy gradient updates that enabled stable deep RL at scale — producing the first demonstrations of 3D locomotion in simulated physics. Generalized Advantage Estimation (GAE), also from this collaboration, provided a unified variance-reduction framework. Both papers became foundational to the deep RL era.

MAML (2017)
Model-Agnostic Meta-Learning (Finn, Abbeel, Levine; ICML 2017) introduced a gradient-based meta-learning algorithm enabling few-shot adaptation; one of the most cited ML papers of the decade.

Domain Randomization, SAC, Hindsight Experience Replay, Decision Transformer
Further deep RL contributions from the Abbeel group: Domain Randomization (training across randomized simulations to enable sim-to-real transfer), Soft Actor-Critic (now one of the most popular continuous control RL algorithms), Hindsight Experience Replay (enabling RL in sparse-reward, goal-oriented settings), and Decision Transformer (framing RL as sequence modeling with a transformer).

Deep Unsupervised Learning (CS294-158)
Abbeel developed and has taught Deep Unsupervised Learning at Berkeley — a graduate course covering generative models including VAEs, normalizing flows, GANs, diffusion models, and self-supervised learning. Lecture videos from multiple editions have been released publicly and are widely used as reference material.

Gradescope — Co-Founder (2014–2018)

In 2014, Abbeel co-founded Gradescope with Berkeley-affiliated engineers Arjun Singh, Sergey Karayev, and Ibrahim Awwal. Gradescope is an online grading platform that uses computer vision and AI to streamline the grading of handwritten assignments and exams; it is now used in over 500 universities across the United States. The company was acquired by Turnitin in 2018.

Covariant — Co-Founder (2017–2024)

In October 2017, Abbeel co-founded Covariant (originally named Embodied Intelligence) with three of his PhD students: Peter Chen, Rocky Duan, and Tianhao Zhang. The company’s mission was to build a universal AI that enables robots to perceive and manipulate objects in warehouse and factory environments using deep imitation and reinforcement learning. Covariant launched publicly in January 2020 and raised approximately $147 million across multiple funding rounds. Its flagship product, RFM-1 (Robotics Foundation Model), applies foundation model pretraining to robot manipulation tasks — positioning Covariant at the intersection of large-scale AI and physical automation.

In August 2024, Amazon agreed to license Covariant’s robotics foundation models and hired the company’s founders, including Abbeel, into Amazon’s broader AI organization. A subsequent Washington Post report described the arrangement as leaving Covariant itself as a “zombie startup” following the acquisition of its key assets and team.

Amazon — Amazon Scholar and AGI/LLM Role (2024–present)

Following the Covariant acquisition, Abbeel joined Amazon. In December 2025, he was appointed to lead Amazon’s LLM efforts within its AGI organization, while continuing to work on robotics — reflecting Amazon’s integration of foundation model capabilities with its physical logistics and automation systems. He retains his Berkeley faculty affiliation as an Amazon Scholar.

AIX Ventures — Investment Partner (2021)

Abbeel joined AIX Ventures as an Investment Partner in 2021, the same AI-focused venture fund where Percy Liang and Richard Socher are also partners.

The Robot Brains Podcast

Abbeel hosts The Robot Brains, a weekly podcast featuring conversations with AI and robotics researchers and practitioners. The podcast has become one of the primary public-facing venues for long-form discussion of robot learning and its frontier.

Key Contributions

Apprenticeship Learning via Inverse Reinforcement Learning (ICML 2004, with Andrew Ng) — Introduced the framework of inferring a reward function from expert demonstrations and using it to train agents, enabling helicopter aerobatics at human expert level; foundational to the entire imitation learning field.
Laundry Folding and Cloth Manipulation — Berkeley lab’s demonstration that robots could perceive and manipulate deformable objects using a combination of new visual, physics, and learning methods; a landmark in robotic manipulation research and a high-profile proof of concept for learning-based manipulation.
TRPO (Trust Region Policy Optimization) (ICML 2015, with Schulman, Levine, Moritz, Jordan) — Principled policy gradient update with trust region constraint; enabled stable deep RL at scale and produced first 3D locomotion results; see also the John Schulman Wiki.
MAML (Model-Agnostic Meta-Learning) (ICML 2017, with Finn and Levine) — Gradient-based meta-learning algorithm enabling rapid few-shot adaptation; one of the most cited ML papers of the decade.
Soft Actor-Critic (SAC) — One of the most widely used deep RL algorithms for continuous control; combining off-policy learning with maximum entropy RL for sample efficiency and stability.
Domain Randomization — Framework showing that training across diverse randomized simulation conditions enables policies to generalize to the real world without explicit sim-to-real transfer engineering; now standard practice in robotic RL.
Hindsight Experience Replay (HER) — Enables RL in sparse-reward, goal-conditioned settings by relabeling failed trajectories with the goals that were achieved; practically enabled manipulation learning in realistic goal-oriented settings.
Decision Transformer — Framed reinforcement learning as a sequence modeling problem using a transformer, enabling offline RL by conditioning on return-to-go; influential in bridging the RL and foundation model research communities.
Diffusion Models (through students/collaborators) — Abbeel’s group contributed to the development of diffusion models for robotics and generative AI contexts.
RFM-1 (Robotics Foundation Model) — Covariant’s model applying large-scale pretraining to industrial robot manipulation across diverse objects and settings.
Gradescope — AI-powered grading platform now deployed at 500+ universities; demonstrates Abbeel’s consistent orientation toward building useful systems that extend beyond pure research.
Deep Unsupervised Learning course (CS294-158) — Publicly released lecture series covering generative models that has become a standard resource for graduate ML education.

Awards & Recognition

ACM Prize in Computing (2021) — Awarded for contributions to robot learning, including apprenticeship learning and deep reinforcement learning for robotic control; comes with a $250,000 prize.
Presidential Early Career Award for Scientists and Engineers (PECASE)
NSF CAREER Award
Office of Naval Research Young Investigator Program (ONR-YIP)
DARPA Young Faculty Award (DARPA-YFA)
MIT Technology Review 35 Under 35 (TR35)
IEEE Fellow
ACM Fellow (待核实)

Academic Lineage

Abbeel’s student advising record is among the most impactful in AI history by the metric of company-founding:

John Schulman — PhD; co-founded OpenAI; developed TRPO and PPO; previously at Anthropic; now at Thinking Machines Lab.
Chelsea Finn — PhD; developed MAML; co-founded Physical Intelligence; now associate professor at Stanford.
Aravind Srinivas — PhD; co-founded Perplexity AI, the AI-native search engine.
Sergey Levine — PhD (co-advised); co-founded Physical Intelligence; now associate professor at UC Berkeley.
Peter Chen, Rocky Duan, Tianhao Zhang — PhD students who co-founded Covariant with Abbeel.
Deepak Pathak — Co-founded Skild.
Jonathan Ho — Co-founded Ideogram; key contributor to diffusion model research.
Ajay Jain — Co-founded Genmo.
Misha Laskin — Founded Reflection AI.
Roshan Rao — Co-founded Evolutionary Scale (protein language models).

Key Relationships

Andrew Ng — PhD advisor at Stanford; Abbeel was Ng’s first-ever PhD student when Ng was a first-year professor; the apprenticeship learning framework they developed together became Abbeel’s signature early contribution.
John Schulman — PhD student and longest-running research collaborator; their joint work on TRPO and GAE is among the most cited in deep RL; Schulman’s subsequent founding of OpenAI with Abbeel’s networks completed a teacher-student cycle of unusual scale.
Chelsea Finn — PhD student; MAML is a three-way collaboration between Finn, Abbeel, and Levine; Finn’s subsequent career at Stanford and Physical Intelligence reflects the research agenda Abbeel’s lab developed.
Sergey Levine — PhD alumnus and now Berkeley colleague; Physical Intelligence’s founding team (Finn and Levine) came directly from Abbeel’s academic family; their ongoing co-advising relationship at Berkeley continues the collaboration.
Michael I. Jordan — Senior Berkeley colleague and co-author on TRPO; Jordan’s influence on the mathematical foundations of Abbeel’s RL work connects the two generations of Berkeley ML.
Percy Liang — AIX Ventures co-investor; their shared position within the Berkeley/Stanford AI community and AIX Ventures reflects the dense institutional network around Bay Area AI research.

Personal Style

Abbeel’s research agenda is characterized by a consistent drive to make robots genuinely useful in the physical world rather than impressive in controlled laboratory demonstrations. His group’s milestones — helicopter aerobatics, laundry folding, surgical suturing, warehouse manipulation — were deliberately chosen for their combination of public recognizability and technical difficulty; they demonstrated that learning-based methods could handle the messy, deformable, partially observable reality of physical tasks rather than only clean, rigid, well-specified ones. His institutional investments — BAIR, Covariant, Gradescope, the Deep Unsupervised Learning course, The Robot Brains podcast — reflect a parallel commitment to infrastructure and access: building the platforms through which research happens and through which the broader community engages with it. His X/Twitter vibe (35.8% “Informing,” 21.5% “Announcing,” 10% “Teaching”) and dominant topic of Robot Brains podcast promotion suggest a communicator who sees himself as much as an ecosystem builder as a primary researcher.