Co-founder and Chief Science Officer of Hugging Face — creator of the Transformers library, architect of the BigScience/BLOOM initiative, and the most consequential individual in the democratization of open-source AI over the past decade.
Profile
| Field | Detail |
|---|---|
| Nationality | French |
| Current Institution | Hugging Face (Co-founder & Chief Science Officer) |
| Research Areas | Open-Source ML Infrastructure, Large Language Models, Multilingual AI, Robotics, AI for Science |
| PhD | Statistical and Quantum Physics, Pierre and Marie Curie University (UPMC / Sorbonne) |
| Personal Website | thomwolf.io |
| X / Twitter | @Thom_Wolf |
| GitHub | @thomwolf |
| Google Scholar | scholar.google.com (58,000+ citations) |
Overview
Thomas Wolf is the co-founder and Chief Science Officer of Hugging Face, the company that has become the central infrastructure layer of the open-source AI ecosystem. He is the primary author of the Hugging Face Transformers library — which provides unified, production-quality implementations of virtually every major pretrained model family and has become the de facto standard interface for applied NLP, vision, and multimodal AI worldwide. He led the BigScience Workshop (2021–2022), a year-long global collaborative effort of over a thousand researchers that produced BLOOM, the first openly licensed large language model to exceed GPT-3’s parameter count. Beyond Transformers, Wolf has created or led the development of the Datasets, Diffusers, Accelerate, DataTrove, smolagents, and LeRobot libraries, constructing the open-source toolkit through which most of the world’s AI practitioners work. His professional trajectory is unusual: he began as a physicist, spent five years as a patent attorney, discovered machine learning through his legal clients, and co-founded Hugging Face in 2016 with no formal background in computer science — yet built one of the most-used software repositories in GitHub history and helped steer a company to a $4.5 billion valuation.
Early Life & Education
Wolf was born and raised in France. He pursued undergraduate studies in theoretical physics and mathematics at École Polytechnique in Paris, one of France’s elite engineering grandes écoles.
Ph.D., Statistical and Quantum Physics — Pierre and Marie Curie University (UPMC / Sorbonne)
Wolf’s doctoral research addressed quantum and statistical field theories. During his PhD he taught physics — an experience he later described as something he missed — and spent time as a research intern at Lawrence Berkeley National Laboratory, working on X-ray generation from laser-plasma accelerators. The interdisciplinary character of his PhD training, bridging mathematics, statistical mechanics, and experimental physics, would prove unexpectedly relevant when he later discovered that many techniques in machine learning were essentially re-branded statistical physics.
Law Degree — Panthéon-Sorbonne University
Following his doctorate, Wolf pursued a second academic path and earned a law degree, also in Paris.
Intellectual Property — Centre d’Études Internationales de la Propriété Intellectuelle (CEIPI)
Wolf additionally studied intellectual property law at CEIPI, the specialist French institute for IP, completing the credentials needed to practice as a European patent attorney.
Career
Lawrence Berkeley National Laboratory — Research Intern
During his doctoral period, Wolf interned at Lawrence Berkeley National Laboratory in the United States, working on laser-plasma physics. This early exposure to American research culture and the international academic environment anticipated his later transatlantic career.
Cabinet Plasseraud — Patent Attorney, Paris (c. 2009–2015)
After completing his doctorate and law studies, Wolf joined Cabinet Plasseraud, a major Parisian intellectual property firm, where he worked for approximately five years as a European Patent Attorney. His client portfolio consisted largely of deep learning, machine learning, and AI startups — an unusual vantage point from which to observe the early commercial deployment of neural network methods. He has described the pivot to AI as partly accidental: advising technology companies on their IP exposed him to the underlying mathematics, which he recognized as familiar statistical physics in new notation. This recognition catalyzed a self-directed education in machine learning through books and online courses, conducted in parallel with his legal practice.
Hugging Face — Co-Founder and CSO (2016–present)
In 2016, Wolf joined Clément Delangue and Julien Chaumond to co-found Hugging Face in New York City. The company launched as a consumer chatbot application targeted at teenagers — the name derives from the
emoji, which the community adopted organically and which became a permanent element of the company’s identity. The chatbot attracted modest traction, but the inflection came from an adjacent decision: when the team open-sourced the NLP tools they had built internally, the developer community’s response was immediate and overwhelming. Wolf led the technical pivot toward infrastructure, and the Transformers library was born.
Transformers library (2019–present)
Wolf architected and released the Hugging Face Transformers library, providing unified PyTorch, TensorFlow, and JAX implementations of BERT, GPT-2, T5, RoBERTa, and the hundreds of pretrained model architectures that followed. Before Transformers, working with each model required navigating the original research code — typically incomplete, inconsistently documented, and framework-specific. Transformers unified the interface, standardized the API, and paired it with the Hugging Face Hub for model hosting and sharing. The repository has accumulated millions of stars on GitHub and is imported in the majority of academic ML papers involving pretrained models; it functions as the plumbing beneath much of what the world calls “AI.”
Datasets library (2020–present)
Wolf co-developed the Datasets library, providing standardized, memory-efficient access to thousands of ML datasets with a uniform API, Apache Arrow-backed fast loading, and integration with the Hugging Face Hub. It reduced the friction of dataset acquisition and preprocessing to a handful of lines of code, eliminating a major reproducibility and access bottleneck in applied ML research.
BigScience Workshop and BLOOM (2021–2022)
Wolf led the BigScience Workshop, a year-long open scientific collaboration he organized beginning in April 2021, eventually involving over a thousand researchers from more than sixty countries and drawing significant compute contributions from GENCI and IDRIS (French national supercomputing centers). The project’s output was BLOOM — a 176-billion-parameter multilingual language model trained on 46 natural languages and 13 programming languages, released in July 2022 under the Responsible AI License (RAIL). BLOOM was at the time of its release the largest openly accessible language model in existence, and the BigScience process established a template for open collaborative development of frontier AI — a model that influenced subsequent efforts including the Falcon, Mistral, and Llama releases.
Diffusers, Accelerate, DataTrove, smolagents (2022–present)
Wolf continued expanding the Hugging Face library ecosystem: Diffusers (2022) became the standard library for diffusion model inference and training; Accelerate provided hardware-agnostic distributed training abstractions; DataTrove (2024) addressed large-scale data processing for pretraining; smolagents (2024–2025) provided a framework for building lightweight AI agents.
LeRobot and the robotics pivot (2024–present)
In 2024, Wolf began directing a significant portion of Hugging Face’s open-source effort toward robotics, explicitly following the Transformers playbook: build open infrastructure, release datasets, and lower the barrier to entry for hardware. LeRobot, the open-source robot learning library, became the most widely used platform for open robotics on GitHub within months of release. In April 2025, Hugging Face acquired Pollen Robotics, adding open-source hardware (the Reachy 2 humanoid robot) to the software stack. Wolf has described the robotics bet as “the same inflection point for physical AI that we were at for LLMs just a few years ago.” The SO-100 robotic arm — a companion hardware project Wolf promoted alongside LeRobot — was designed explicitly to cost under $100, embodying the democratization ethos at the hardware level.
Hugging Face Hub and platform growth
Under Wolf’s scientific direction alongside CEO Clément Delangue and CTO Julien Chaumond, Hugging Face has grown from a consumer app to the world’s largest repository of public AI models, datasets, and demos. The Hub hosts millions of models and datasets, serves over seven million users, and has attracted investments valuing the company at $4.5 billion (Series D round, 2023). The company has approximately 250 employees as of 2025.
Key Contributions
-
Hugging Face Transformers — The library that unified the fractured pretrained model landscape into a single, maintained, documented Python package; now the default interface for working with neural language, vision, and multimodal models globally. Its open-access release made state-of-the-art NLP accessible without a PhD or an enterprise budget.
-
BLOOM and the BigScience Workshop — Wolf organized and led the first large-scale open collaborative development of a frontier LLM (176B parameters, 46 languages), establishing both a model artifact and a process template for community-led open science at scale.
-
Hugging Face Hub — The platform for sharing models, datasets, and demos that Wolf helped build into the central infrastructure of the global AI practitioner community — functioning as what observers have called a “GitHub for machine learning.”
-
Datasets library — Unified dataset access with memory efficiency and reproducibility-first design; reduced a major practical barrier to ML research for researchers and developers without large storage or compute infrastructure.
-
Diffusers — The standard open-source library for diffusion model inference and training, enabling the widespread adoption of stable diffusion variants and subsequent image and audio generation models.
-
LeRobot — Open-source robot learning library designed to bring to physical AI the same accessible, community-driven infrastructure that Transformers brought to NLP; became the leading open robotics platform within its first year.
-
FineWeb and the Ultra-Scale Playbook — FineWeb is a high-quality open dataset for LLM pretraining (15 trillion tokens); the Ultra-Scale Playbook is an open educational resource on training large models efficiently — both emblematic of Wolf’s philosophy of releasing the working knowledge behind frontier AI, not just artifacts.
-
Natural Language Processing with Transformers (O’Reilly, with Lewis Tunstall and Leandro von Werra) — The reference textbook for practitioners working with transformer-based models, widely used in courses and by developers in industry.
-
smolagents — A lightweight framework for building AI agents, reflecting Wolf’s current focus on AI that acts in the world rather than merely predicts text.
Awards & Recognition
- World Economic Forum — Speaker and participant; recognized as a leading voice on open AI and democratization of technology.
- TED Talk (March 2025) — Delivered a TED talk advocating for open-source AI.
- Google Scholar citations — Over 58,000 citations, primarily driven by the Transformers library paper, the BLOOM paper, and the BigScience dataset papers — an unusual citation profile for someone who did not complete a traditional ML doctorate.
Key Relationships
- Clément Delangue — CEO and co-founder of Hugging Face; while Delangue leads business and strategy, Wolf leads scientific direction; their complementary roles have defined the company’s dual identity as both commercial platform and open-science institution. Delangue is among Wolf’s closest professional connections.
- Julien Chaumond — CTO and co-founder; the third pillar of the founding team, responsible for engineering architecture while Wolf drove the scientific and open-source agenda.
- Yann LeCun — Among Wolf’s most prominent professional followers; LeCun’s advocacy for open AI development at Meta aligns closely with Wolf’s philosophy at Hugging Face; the two represent the industry’s most visible open-source counterweight to closed AI development.
- Rémi Cadène — Former Tesla Optimus scientist who joined Hugging Face to lead the LeRobot initiative; Wolf’s key collaborator on the robotics pivot.
- The BigScience community — Over a thousand researchers co-created BLOOM under Wolf’s organizational direction; this relationship defines his unusual mode of scientific collaboration — organizing a distributed scientific workforce rather than leading a traditional lab.
Personal Style
Wolf’s career follows a trajectory that resists the usual AI biography: theoretical physicist, then patent lawyer, then self-taught machine learning engineer, then open-source infrastructure architect. He has described the transition from law to ML as recognizing that the mathematical substrate of deep learning was the statistical physics he already knew — a pattern-matching act that speaks to a habit of finding structural similarities across apparently unrelated domains. His commitment to open science is not rhetorical: he has consistently chosen to release working, maintained code and data rather than papers about planned releases, and he has organized his largest initiatives — BigScience, FineWeb, LeRobot — as collaborative community efforts rather than proprietary research programs. His Digg vibe profile (primarily “Informing” and “Hopeful,” rarely “Provocative”) captures a communicator more interested in sharing what he’s building than in scoring debating points, though he has been direct about his views on closed AI development. He has noted that he still misses teaching from his time as a PhD instructor, and his blog posts, educational playbooks, and book co-authorship reflect a sustained parallel commitment to explaining, not just building.
References
- Personal website — thomwolf.io
- X / Twitter — @Thom_Wolf
- GitHub — @thomwolf
- Digg profile
- Google Scholar
- Wikipedia — Hugging Face
- Sequoia Capital podcast — “Training Data: Thomas Wolf” (2025)
- TechCrunch Disrupt 2025 profile
- London Tech Week speaker bio
- Founders File — Thomas Wolf
- Hugging Face blog — Pollen Robotics acquisition (2025)
- World Economic Forum profile