Research scientist at OpenAI and foundational contributor to the o-series reasoning models, whose decade-long research program on strategic reasoning under imperfect information produced three superhuman AI milestones — Libratus, Pluribus, and CICERO — across poker and the strategy game Diplomacy, before converging on the problem of reasoning in large language models.
Profile
| Field | Detail |
|---|---|
| Full name | Noam Brown |
| Date of birth | Not publicly available |
| Nationality | American |
| Current institution | OpenAI |
| Current role | Research Scientist |
| Research areas | Reasoning, reinforcement learning, self-play, multi-agent AI, imperfect-information games, computational game theory |
| PhD thesis | Equilibrium Finding in Large Adversarial Imperfect-Information Games (CMU, 2020) |
| PhD advisor | Tuomas Sandholm |
| Personal website | noambrown.com |
| X / Twitter | @polynoamial |
| GitHub | noambrown |
| Google Scholar | RLDbLcUAAAAJ |
Overview
Noam Brown is a research scientist at OpenAI and one of the most consequential figures in the modern history of AI and games. Working from a unified theoretical foundation — equilibrium finding and search in imperfect-information settings — he co-created three systems that each represented a first-of-its-kind milestone: Libratus (2017), the first AI to defeat top human professionals in heads-up no-limit Texas Hold’em; Pluribus (2019), the first to beat top players in six-player multiplayer poker, with results published on the cover of Science; and CICERO (2022), the first AI to achieve human-level performance in the natural language strategy game Diplomacy, published in Science. After joining OpenAI in 2023, Brown became a foundational contributor to the o-series reasoning models (o1, o3), applying the same insight that performance scales with additional compute at inference time — drawn directly from his game-playing research — to large language models. He holds three dissertation awards from CMU, the Marvin Minsky Medal, a NeurIPS Best Paper Award, and was named one of MIT Technology Review’s 35 Innovators Under 35.
Early Life & Education
Brown graduated from Rutgers University with a B.A. in Mathematics and Computer Science, summa cum laude, 2005–2008, where he was a member of the Rutgers College Honors Program. During and immediately after Rutgers, he worked as an algorithmic trading engineer at MJM Trading Group in New York (2006–2010), an early grounding in quantitative decision-making under uncertainty that would shape his later research interests. From 2010 to 2012 he worked at the Federal Reserve Board of Governors in the International Financial Markets section, researching algorithmic trading in financial markets.
He enrolled at Carnegie Mellon University in 2012, completing an M.S. in Robotics (2012–2014) and a Ph.D. in Computer Science (2014–2020), both advised by Tuomas Sandholm. His doctoral thesis, Equilibrium Finding in Large Adversarial Imperfect-Information Games, developed a suite of algorithmic advances — including faster counterfactual regret minimization (CFR) variants, safe and nested subgame solving, and depth-limited search — that together enabled, for the first time, an AI to defeat top human professionals in full-scale poker. The thesis received the CMU School of Computer Science Distinguished Dissertation Award, the AAAI ACM-SIGAI Dissertation Award, and the IFAAMAS Victor Lesser Distinguished Dissertation Award.
Career
MJM Trading Group — Algorithmic Trading Engineer (2006–2010)
Worked in algorithmic trading in New York, developing quantitative strategies and gaining early exposure to decision-making under uncertainty in adversarial financial markets.
Federal Reserve Board of Governors — Research Assistant (2010–2012)
Conducted research in the International Financial Markets section, studying algorithmic trading and financial market microstructure. Also organized outreach through FedEd, teaching financial literacy and monetary policy to DC-area high school students.
Carnegie Mellon University — Research Assistant, Ph.D. (2012–2020)
Under Tuomas Sandholm, Brown built a sequence of poker-playing AIs of increasing capability — Tartanian7 (AAAI 2015 champion), Claudico, Baby Tartanian8 (2016 champion), and ultimately Libratus and Pluribus. His annual victories in the Computer Poker Competition and the two landmark head-to-head competitions against professional players (January 2017 for Libratus, July 2019 for Pluribus) made him the central figure in solving poker as a grand challenge for AI. He also completed a summer research internship at DeepMind in London in 2017.
Facebook AI Research (FAIR / Meta) — Research Scientist (2018–2023)
Joined FAIR in New York while completing his Ph.D., then continued full-time. At FAIR, Brown extended his imperfect-information game research from two-player zero-sum settings toward cooperative and mixed-motive games. He co-developed a series of Diplomacy-playing AIs, culminating in CICERO — the first AI to achieve human-level performance in Diplomacy, a game requiring both strategic planning and natural language communication to form coalitions, negotiate, and deceive. CICERO was published in Science in November 2022 as a joint paper of Meta’s Fundamental AI Research Diplomacy Team. He also co-developed ReBeL (NeurIPS 2020), a general framework combining deep reinforcement learning and search for imperfect-information games.
OpenAI — Research Scientist (2023–present)
Joined OpenAI in San Francisco in 2023 to work on reasoning, reinforcement learning, self-play, and multi-agent AI. Brown became a foundational contributor to the o-series reasoning models, the first of which — o1 (codenamed “Strawberry”) — was publicly released in September 2024. He confirmed his involvement publicly on X at launch and has since presented the o-series work at venues including the Simons Institute at Berkeley, Harvard’s Kempner Institute, CMU’s Katayanagi Distinguished Lecture series, and NUS. The core idea underlying the o-series — that LLM performance on complex reasoning tasks scales predictably with more reinforcement learning compute and more inference-time compute — is a direct extension of the scaling-with-search insight Brown had demonstrated across his game-playing work.
Key Contributions
-
Libratus (Brown and Sandholm; Science, 2017) — The first AI to defeat top human professionals in heads-up no-limit Texas Hold’em, winning by a margin of over 14.7 big blinds per 100 hands across 120,000 hands against four top-ranked professionals in January 2017. Built on safe and nested subgame solving and enhanced CFR algorithms. Won HPCWire’s “Best Use of AI” Award; named one of 12 finalists for Science’s Scientific Breakthrough of the Year 2017. Received the Marvin Minsky Medal for Outstanding Achievements in AI.
-
Pluribus (Brown and Sandholm; Science, 2019, cover article) — The first AI to defeat top human professionals in six-player no-limit Texas Hold’em, achieving superhuman performance against a field that included five of the world’s top professionals simultaneously. Introduced depth-limited lookahead for multiplayer imperfect-information games. Published on the cover of Science and named a runner-up for Science’s Breakthrough of the Year 2019 (one of nine). Named one of Science’s top 10 scientific breakthroughs of 2019.
-
CICERO (Meta FAIR Diplomacy Team including Brown as lead research scientist; Science, 2022) — The first AI to achieve human-level performance in the board game Diplomacy, which requires natural language negotiation, coalition building, and strategic deception across multiple players. CICERO combined a language model for communication with strategic reasoning via equilibrium search, finishing in the top 10% of human players in online games against unaware human opponents.
-
Safe and Nested Subgame Solving for Imperfect-Information Games (Brown and Sandholm; NeurIPS 2017) — NeurIPS 2017 Best Paper Award. Introduced the subgame solving technique that was a core component of both Libratus and Pluribus, enabling real-time, theoretically grounded planning in large imperfect-information games.
-
Deep Counterfactual Regret Minimization (Brown, Lerer, Gross, Sandholm; ICML 2019) — Extended counterfactual regret minimization to deep neural network function approximation, enabling CFR to scale beyond explicitly enumerated game trees and making it applicable to the largest poker variants and beyond.
-
ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Brown, Bakhtin, Lerer, Gong; NeurIPS 2020) — A general algorithm combining deep RL with tree search at both training and test time for imperfect-information games, bridging the gap between the perfect-information game paradigm (AlphaZero-style) and the imperfect-information setting.
-
OpenAI o-series reasoning models (o1, o3) (OpenAI; 2024–present) — Foundational contributor to OpenAI’s o-series, the first LLMs trained via reinforcement learning to generate a hidden chain of thought before responding. o1 (released September 2024) scored 83% on the qualifying exam for the International Mathematics Olympiad (vs. 13% for GPT-4o), exceeded human PhD-level accuracy on science benchmarks, and established inference-time compute scaling as a new paradigm for AI capability improvement — a direct intellectual descendant of Brown’s work on search-at-test-time in games.
Awards & Recognition
- AAAI ACM-SIGAI Dissertation Award (2020) — For the best dissertation in AI
- IFAAMAS Victor Lesser Distinguished Dissertation Award (2020) — Best dissertation in multi-agent systems
- CMU School of Computer Science Distinguished Dissertation Award (2020)
- Marvin Minsky Medal for Outstanding Achievements in AI (2019) — Awarded by IJCAI for Libratus
- MIT Technology Review 35 Innovators Under 35 (2019)
- Science Breakthrough of the Year runner-up (2019) — Pluribus, one of nine runners-up
- AAAI Outstanding Paper Honorable Mention (2019) — One of four papers receiving special recognition from 7,095 submissions
- NeurIPS Best Paper Award (2017) — “Safe and Nested Subgame Solving for Imperfect-Information Games,” one of three Best Papers from 3,240 submissions
- Allen Newell Award for Research Excellence (2017) — CMU School of Computer Science
- Science Scientific Breakthrough of the Year finalist (2017) — Libratus, one of 12 finalists
- Open Philanthropy AI Fellowship (2018) — One of seven recipients
- Tencent AI Lab Fellowship (2018) — One of five recipients
- Annual Computer Poker Competition — 1st place, No-Limit Texas Hold’em (2014, 2016)
- HPCWire “Best Use of AI” Award — For Libratus (2017, 2018)
Key Relationships
-
Tuomas Sandholm — Ph.D. advisor at CMU and co-creator of Libratus and Pluribus; Angel Jordan University Professor of Computer Science at CMU and CEO of Strategy Robot, Inc. The foundational collaboration that produced Brown’s core methodology in imperfect-information game solving.
-
Adam Lerer — Primary collaborator at FAIR on the Diplomacy line of research, including CICERO and ReBeL; co-authored multiple NeurIPS, ICLR, and ICML papers with Brown across 2018–2023.
-
Anton Bakhtin — Co-lead on CICERO and Diplomacy work at FAIR; co-author on the Science 2022 paper and multiple related conference papers.
-
Jakob Foerster — Collaborator at FAIR on Hanabi and Diplomacy-adjacent problems; co-author on several papers on cooperative imperfect-information games.
-
Hengyuan Hu — Frequent collaborator at FAIR on human-AI coordination and Hanabi; co-author on multiple papers probing the distinction between self-play-optimal and human-compatible policies.
-
Gabriele Farina — Collaborator on equilibrium-finding theory (ICML 2019 with Sandholm; ICLR 2023 Best Paper Honorable Mention); represents the theoretical game theory strand of Brown’s work.
Personal Style
Brown’s research career is defined by a single, progressively deepened question: how do you make an AI reason strategically in settings where information is hidden and other agents are actively trying to exploit you? Every major project — poker, Diplomacy, LLM reasoning — is a new domain where that question is tested at increasing scale and complexity. He has been unusually explicit in public talks and posts about the intellectual continuity connecting game-playing research to reasoning models, arguing that the scaling-with-inference-compute phenomenon discovered in poker (more search = better decisions) is the same phenomenon now observed in chain-of-thought reasoning models. On X (@polynoamial — a pun on “polynomial”), he posts substantively about benchmarks, scaling behavior, research careers, and the practical implications of AI progress, with a tone that is direct and empirically grounded. He has continued teaching game theory to gifted high school students through the Rutgers Young Scholars Program every summer since 2009, reflecting a long-standing interest in outreach that predates his research fame.
References
- Personal website: noambrown.com
- CV: noambrown.com/downloads/CV.pdf
- X profile: x.com/polynoamial
- GitHub: github.com/noambrown
- Google Scholar: scholar.google.com/citations?user=RLDbLcUAAAAJ
- LinkedIn: linkedin.com/in/noam-brown-8b785b62
- Digg profile: digg.com/u/x/polynoamial
- CMU dissertation record: csd.cmu.edu/academics/doctoral/degrees-conferred/noam-brown
- OpenAI blog, “Learning to Reason with LLMs” (September 2024): openai.com/index/learning-to-reason-with-llms
- Simons Institute talk, “Learning to Reason with LLMs” (September 2024): simons.berkeley.edu/talks/noam-brown-openai-2024-09-26
- Stanford EE seminar bio (2023): ee-www.stanford.edu/event/10-19-2023/cicero