J. Zico Kolter

Professor and Director of the Machine Learning Department at CMU, whose work on differentiable optimisation, provable adversarial robustness, and automated LLM jailbreaking earned him a seat on OpenAI’s board as chair of its Safety and Security Committee.


Profile

Nationality American
Current Institution(s) Carnegie Mellon University — Machine Learning Department (Professor & Director); Bosch Center for AI (Chief Scientist, AI Research); Gray Swan AI (Co-founder & Senior Advisor); OpenAI (Board Director, Safety & Security Committee Chair)
Research Areas AI Safety, Adversarial Robustness, Differentiable Optimisation, Deep Learning Theory, Implicit Neural Networks, Energy Systems
Doctoral Advisor Andrew Y. Ng
Doctoral Thesis Learning and Control with Inaccurate Models (Stanford University, 2010)
Website zicokolter.com
X / Twitter @zicokolter
GitHub zkolter · locuslab (lab org)
Google Scholar J. Zico Kolter

Overview

J. Zico Kolter is an American computer scientist who has spent his career at the precise intersection of optimisation theory and neural network design. As Professor and Director of the Machine Learning Department at Carnegie Mellon University — a role he has held since joining the faculty in 2012 — he has produced a body of work that is technically unusual for its insistence on hard guarantees: architectures whose outputs are provably robust to adversarial perturbations, layers that are literally convex optimisation solvers, and networks defined not by explicit forward-pass equations but by fixed-point conditions. His 2023 paper introducing the Greedy Coordinate Gradient (GCG) attack, which demonstrated for the first time that aligned large language models could be automatically and universally jailbroken, simultaneously became one of the most cited papers in LLM safety and the founding work of Gray Swan AI, the security startup he co-founded. In August 2024 he was appointed to OpenAI’s board of directors and made chair of its Safety and Security Committee — one of the most operationally significant AI safety governance roles in the industry.


Early Life & Education

Kolter completed his undergraduate degree in Computer Science at Georgetown University. He then enrolled in the PhD programme in Computer Science at Stanford University in 2005, where he worked in Andrew Ng’s group at the intersection of machine learning and robotic control. His doctoral thesis, Learning and Control with Inaccurate Models (2010), addressed how reinforcement learning agents can function effectively when the models they use to plan are imperfect — a concern with implications for both robotics and safe AI that prefigures his later safety-focused research. Co-authored work with Ng and Sebastian Thrun during this period covered legged locomotion, extreme autonomous driving, and energy disaggregation. Following his doctorate he held a postdoctoral fellowship at MIT CSAIL from 2010 to 2012, before joining CMU as an assistant professor.


Career

Carnegie Mellon University — ML Department (2012–present)

Kolter joined CMU in 2012 and was promoted through the ranks to full professor, eventually becoming Director (department head) of the Machine Learning Department within the School of Computer Science. He maintains affiliations with the Computer Science Department, the Robotics Institute, the Software and Societal Systems Department, and the CyLab Security and Privacy Institute. His research group, known as the Locus Lab (GitHub: locuslab), has produced a stream of influential papers across three overlapping themes:

Differentiable optimisation and implicit architectures. The most technically distinctive strand of Kolter’s work treats classical optimisation problems as building blocks for neural networks. OptNet (ICML 2017, with Brandon Amos) showed how to embed a quadratic programme as a differentiable layer, enabling networks to enforce hard constraints and reason about structured dependencies that conventional layers cannot capture; it is implemented in the open-source qpth package and became a foundational reference for the implicit differentiation literature. Input Convex Neural Networks (ICML 2017, also with Amos) introduced architectures whose outputs are guaranteed convex in their inputs, with applications in energy-system optimisation and structured prediction. Deep Equilibrium Models (DEQ, NeurIPS 2019, with Shaojie Bai and Vladlen Koltun) reformulated deep networks as the fixed points of a single repeated layer, enabling infinite-depth reasoning with constant memory during forward passes via root-finding solvers; a follow-up on multiscale DEQs extended the framework further.

Provable adversarial robustness. Beginning around 2018, Kolter’s group turned to the question of whether deep classifiers could be made certifiably robust — not just empirically resistant to attacks but provably so, with formal guarantees. The 2018 ICML paper “Provable Defenses Against Adversarial Examples via the Convex Outer Adversarial Polytope” (with Eric Wong) was the first method to deliver exactly this for networks of non-trivial size, using a convex relaxation of the adversarial region around each input to compute exact certified radii. Follow-on work developed randomised smoothing and Lipschitz-bounded architectures, leading to a CMU group that became one of the primary contributors to certifiable robustness benchmarks.

LLM safety and automated red-teaming. From 2022 Kolter’s group began applying adversarial methods to aligned language models. The landmark paper “Universal and Transferable Adversarial Attacks on Aligned Language Models” (arXiv 2307.15043, 2023, with Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, and Matt Fredrikson) introduced the GCG attack: a gradient-based discrete optimisation method that finds a universal suffix which, when appended to any harmful query, causes aligned LLMs — including Llama-2, Vicuna, ChatGPT, Bard, and Claude — to comply. The paper demonstrated that LLM safety guardrails were fragile in a structurally fundamental way, sparked a substantial wave of follow-on jailbreak and defence research, and received wide coverage in the technical and general press.

C3.ai — Chief Data Scientist (dates unconfirmed)

Kolter served as chief data scientist at C3.ai, the enterprise AI software company, at an earlier point in his career. He told journalists that LLMs “are attack vectors,” a quote that circulated widely after the GCG paper. The timing and duration of the C3.ai role have not been publicly specified beyond LinkedIn.

Bosch Center for AI — Chief Scientist, AI Research (ongoing)

In addition to his CMU position, Kolter serves as chief scientist of AI research at the Bosch Center for AI’s Pittsburgh office. Bosch provides substantial funding for research in his CMU group, enabling work at a scale not typical for an academic lab; the relationship has been openly acknowledged in his institutional bio.

Gray Swan AI — Co-founder & Senior Advisor (2023/2024–present)

Kolter co-founded Gray Swan AI — named after the concept of foreseeable but underweighted catastrophic risks — as a Pittsburgh-based AI safety and security firm whose core mission is hardening AI systems against adversarial attacks and evaluating LLM safety at scale. The company’s research output includes the nanoGCG package, a lightweight open-source implementation of the GCG algorithm. Kolter serves as senior advisor while retaining his CMU position.

OpenAI Board of Directors & Safety and Security Committee (August 2024–present)

On 8 August 2024, OpenAI announced Kolter’s appointment to its board of directors, simultaneously naming him chair of the newly constituted Safety and Security Committee. The committee holds the authority to make recommendations on critical safety and security decisions for all OpenAI projects, including the power to delay model releases pending safety review. Board chair Bret Taylor noted that Kolter “adds deep technical understanding and perspective in AI safety and robustness.” Regulators embedded references to the committee’s oversight function in formal agreements with OpenAI, making Kolter’s role one of the few AI safety governance positions with documented regulatory standing. In 2025 he was additionally named a recipient of Schmidt Sciences’ AI safety science programme funding.


Key Contributions

  • OptNet — Differentiable Optimisation as a Layer in Neural Networks (ICML 2017, with Brandon Amos) — First general framework for embedding constrained quadratic programmes as differentiable neural network layers; introduced the qpth PyTorch package and founded the implicit differentiation / declarative networks sub-field.
  • Input Convex Neural Networks (ICML 2017, with Amos) — Architectures whose outputs are convex in their inputs by construction, enabling energy-based models and structured prediction with guaranteed convex geometry.
  • Provable Defences via the Convex Outer Adversarial Polytope (ICML 2018, with Eric Wong) — First method yielding certifiably robust deep classifiers for non-trivial networks, via a linear-programming relaxation of the adversarial input region.
  • Deep Equilibrium Models (DEQ) (NeurIPS 2019, with Shaojie Bai and Vladlen Koltun) — Reformulated deep networks as fixed points of a single repeated transformation, solved via root-finding with constant memory; opened the implicit neural networks literature.
  • GCG Attack / Universal and Transferable Adversarial Attacks on Aligned LLMs (arXiv 2307.15043, 2023; with Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, Matt Fredrikson) — Introduced the Greedy Coordinate Gradient method, the first automated universal jailbreak for aligned language models; demonstrated successful transfer to closed models with no white-box access; became the defining attack baseline for LLM red-teaming research and the foundation of Gray Swan AI.
  • OpenAI Safety and Security Committee (2024–present) — As chair, holds formal authority to recommend delays of model releases on safety grounds; an unusual instance of technical AI safety research directly instantiated in corporate governance with regulatory acknowledgement.

Awards & Recognition

  • Multiple Best Paper Awards at NeurIPS, ICML, and AISTATS — Recognised across the venues where his group publishes most frequently; specific years and papers not consolidated in a single public source.
  • Schmidt Sciences AI Safety Science Programme (2025) — Competitive funding recognising early-stage AI safety research.
  • OpenAI Board Appointment (2024) — Selection as the primary technical safety voice on the board of the world’s highest-profile AI company.
  • NSF CAREER Award (c. 2014–2019, dates approximate) — Early-career funding at CMU for work on energy disaggregation and machine learning.

Key Relationships

  • Andrew Y. Ng — PhD advisor at Stanford; the early work on robotics, locomotion, and energy disaggregation under Ng established the engineering-first, real-world-application orientation that runs through all of Kolter’s subsequent research.
  • Brandon Amos — His most influential PhD student; co-authored both the OptNet and ICNN papers, whose ideas continue to ramify through differentiable programming; Amos is now at Meta FAIR.
  • Andy Zou — PhD student and co-author of the GCG paper; simultaneously co-founder of Gray Swan AI; Zou’s work on adversarial LLM attacks gave Kolter’s group its most prominent recent research identity.
  • Shaojie Bai — PhD student and first author on Deep Equilibrium Models; the DEQ work represents Kolter’s most architecturally novel contribution to the implicit networks literature.
  • Matt Fredrikson — CMU colleague and co-author on the GCG paper and broader LLM safety research; a cryptographer-turned-security researcher who brought rigour to the adversarial ML framing.
  • Vladlen Koltun — Adobe Research scientist and co-author on DEQs; his systems perspective complemented the theoretical contributions of the Locus Lab.
  • Priya Donti — PhD student from his group who became a prominent voice on AI and climate; co-founded Climate Change AI; her work on differentiable optimisation for energy systems extends OptNet to power grid applications.
  • Sam Altman / OpenAI board — Kolter joined a reconstituted board that also includes Adam D’Angelo and Larry Summers; his appointment was explicitly framed as addressing the technical safety credibility gap left by the November 2023 governance crisis.

Personal Style

Kolter’s research aesthetic is one of the most coherent in contemporary ML: almost everything he publishes either (a) embeds a classically understood mathematical object — a QP, a fixed-point equation, a convex set — inside a neural network, or (b) attacks the question of when and whether neural networks satisfy formal guarantees. The GCG paper is the clearest expression of the second instinct applied to language models: rather than empirically probing safety, it poses the question as an optimisation problem and solves it. His transition from academic to governance roles has been unusually direct — the GCG paper led almost immediately to Gray Swan, and Gray Swan’s profile led to the OpenAI appointment — reflecting a community consensus that his safety credentials are substantive rather than performative. He is notably rare among AI safety figures in having published both major attacks on aligned systems and major defences, which gives his policy positions a technical specificity that more theoretical safety commentators often lack.


References