Belgian computer vision researcher who co-authored the Vision Transformer, MLP-Mixer, SigLIP, and PaliGemma at Google Brain and DeepMind, briefly co-founded OpenAI’s Zürich office, and now continues multimodal AI research at Meta in Zürich.
Profile
| Born | La Calamine, Belgium |
| Nationality | Belgian |
| Current Institution(s) | Meta (Member of Technical Staff, Zürich) |
| Research Areas | Computer Vision, Multimodal AI, Vision-Language Models, Representation Learning, Neural Architecture Design, Transformers |
| Doctoral Advisor | Bastian Leibe |
| Doctoral Thesis | Deep Learning for Computer Vision on Mobile Robots (RWTH Aachen University, 2018) |
| Website | lucasb.eyer.be |
| X / Twitter | @giffmana |
| GitHub | lucasb-eyer |
| Google Scholar | Lucas Beyer |
Overview
Lucas Beyer is a Belgian computer vision and multimodal AI researcher currently at Meta in Zürich, where he continues the research program he developed across six years at Google Brain and Google DeepMind. A self-described “self-taught hacker and studied scientist” who grew up in Belgium with French and German as mother tongues and originally wanted to make video game AI, he earned a Dipl.Ing. and PhD in computer vision and robotics perception at RWTH Aachen University under Bastian Leibe before joining Google Brain Zürich in 2018. There, alongside Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Alexey Dosovitskiy, and others, he co-authored a sequence of papers — ViT, BiT, MLP-Mixer, Scaling ViT, SigLIP, and PaliGemma — that collectively established the Vision Transformer as the dominant paradigm for computer vision and produced the most widely deployed open-source vision-language encoders in use today. In December 2024 he co-founded OpenAI’s Zürich office with Zhai and Kolesnikov; by June 2025 he had moved to Meta’s Zürich research team. His website, blog, and public GitHub embody the “hacker” side of his identity: he has authored over a dozen open-source libraries across C++, Go, and Python, and maintains the big_vision codebase that is the basis for most of Google’s recent vision research.
Early Life & Education
Beyer grew up in La Calamine, a small municipality in the German-speaking community of eastern Belgium, near the German border — a bilingual upbringing that gave him French and German as mother tongues and Dutch and English as working languages. He attended the Athénée César Franck secondary school and developed early interests in game development, programming, and computer-rendered AI. He began his undergraduate studies in 2006 at RWTH Aachen University in Germany, studying Computational Engineering Science — a technically demanding interdisciplinary program spanning mathematics, physics, engineering, and computing. He graduated in July 2012 as a Dipl.Ing. (the pre-Bologna German equivalent of an M.Sc.) with a grade of 1.3, the highest band in the German system. His Diplom thesis, Exploiting Graphics Accelerators for Computational Biology, applied GPU-accelerated computation to genome-wide association studies (GWAS) and was graded 1.0 (perfect).
He briefly began a doctoral program in High-Performance Computing at the AICES institute in late 2012, working on high-performance density functional theory. After recognizing that quantum physics was not his calling, he transferred in mid-2013 into the computer vision group of Prof. Bastian Leibe at RWTH’s Visual Computing Institute, where he completed a PhD in computer vision in 2018. His doctoral research — funded by the EU STRANDS and SPENCER service robot projects — developed deep learning methods for robot perception under low-annotation constraints: head pose estimation (Biternion Nets), pedestrian detection in laser scans (DROW), re-identification (In Defense of the Triplet Loss), and long-term robot scene understanding. During the PhD he interned twice at Google Venice (Los Angeles) — on image-gaze prediction in summer 2016 and FaceNet disentanglement in summer 2017 — and spent one semester at Kindred AI in Toronto working on robot learning from human demonstration.
Career
Google Brain / Google DeepMind, Zürich — Staff Research Scientist (2018–2024)
Beyer joined Google Brain Zürich in June 2018 immediately after defending his PhD, and spent six years there through the merger with DeepMind in 2023, ultimately holding the title of Staff Research Scientist. Over this period he co-led the multimodal (vision-language) research effort and the big_vision codebase — the shared research infrastructure underlying the team’s published and internal vision models.
Scaling vision models and Big Transfer (BiT, 2020). His first major result was Big Transfer (“Big Transfer (BiT): General Visual Representation Learning,” ECCV 2020), which characterized the recipe for transferable visual representations: you must scale model capacity, pretraining dataset size, and training duration together in proportion (“diagonally”) to reap the benefits of scale. BiT set state-of-the-art on a wide range of transfer benchmarks and established the paradigm for visual pretraining that ViT would subsequently inherit.
Vision Transformer (ViT, ICLR 2021). The publication that brought the most lasting impact was “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” (ICLR 2021), co-authored with Alexey Dosovitskiy, Neil Houlsby, Mostafa Dehghani, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and others. The paper demonstrated that a standard Transformer encoder applied directly to patches of an image — with no convolutions — matches or exceeds convolutional networks on ImageNet and other classification benchmarks when pretrained at sufficient scale. ViT became the dominant backbone for computer vision research and practice within two years of publication. Beyer was a central contributor to the paper and the implementation.
ImageNet labeling (arXiv 2020). Before ViT, the group published “Are we done with ImageNet?” proposing new “ReaL” (Reassessed Labels) annotations for ImageNet, fixing systematic errors in the validation labels. This infrastructure work became a standard tool for evaluating vision models honestly.
MLP-Mixer (NeurIPS 2021). “MLP-Mixer: An All-MLP Architecture for Vision” (with Ilya Tolstikhin, Neil Houlsby, Beyer, and others) demonstrated that competitive results on image classification can be achieved with an architecture that contains neither attention nor convolution, only MLPs. A technically provocative result that showed the architectural flexibility of the patch-based paradigm established by ViT.
Knowledge distillation and efficiency. “Patient and Consistent Distillation” introduced a protocol — distill for exceptionally long, and with consistent inputs — that produced the best-performing ResNet-50 model (83% ImageNet top-1) demonstrated to that point, showing that careful distillation can extract far more capability from modest architectures than standard training.
Scaling ViT (CVPR 2022). “Scaling Vision Transformers” (with Zhai, Kolesnikov, Dehghani, and others) demonstrated that ViTs scale reliably with model size and data, achieving new state-of-the-art on ImageNet at 90.45% accuracy with a 22-billion-parameter model — at the time the largest vision model ever trained.
SigLIP (ICCV 2023). “Sigmoid Loss for Language Image Pre-Training” proposed replacing the softmax contrastive loss used in CLIP with a sigmoid loss computed independently per image-text pair. The change eliminated the need for a global gathering step across devices, making contrastive vision-language training substantially more scalable. SigLIP models outperformed CLIP at comparable scale and Beyer’s group open-sourced the best-performing vision encoder and image-text model, which became widely adopted as a plug-in vision backbone for multimodal language model research.
PaliGemma (2024). Beyer co-led the development of PaliGemma, a 3-billion-parameter vision-language model combining a SigLIP vision encoder with Gemma-2B, designed as a transfer model — fine-tunable to a wide range of vision-language tasks. PaliGemma and its successor PaliGemma 2 were released as open weights under a permissive license and became reference models for multimodal transfer learning research.
big_vision. Throughout the Google period, Beyer co-maintained big_vision, the JAX-based research codebase underlying the team’s publications. It was released publicly and became one of the most widely used non-PyTorch vision research frameworks.
OpenAI Zürich — Co-Founder, Member of Technical Staff (December 2024 – June 2025)
In December 2024, Beyer, Xiaohua Zhai, and Alexander Kolesnikov jointly left Google DeepMind to co-found OpenAI’s first European research office in Zürich. The departure of three of Google Brain Zürich’s most senior researchers — responsible for ViT, SigLIP, and PaliGemma — attracted considerable media attention. Beyer described the mandate as “fundamental research towards AGI.” The office expanded rapidly in early 2025 with additional hires from the European research community. In June 2025, Beyer departed after approximately six months, publicly commenting on reporting about Meta’s $100 million signing bonuses, describing the figures as exaggerated and not reflective of his own experience.
Meta Zürich — Member of Technical Staff (June 2025 – present)
Beyer joined Meta’s Zürich research team in summer 2025, continuing fundamental research on multimodal AI. His personal website notes he is “now at Meta in Zürich, where he continues to research multimodal AI.”
Key Contributions
-
Vision Transformer (ViT, ICLR 2021) — Co-authored one of the most widely cited papers in computer vision: demonstrated that a pure Transformer encoder operating on flattened image patches achieves state-of-the-art image classification at scale. Triggered a wholesale shift of the computer vision field from convolutional to attention-based architectures.
-
Big Transfer (BiT, ECCV 2020) — Established the scaling recipe for transferable visual representations (scale model, dataset, and duration together), setting state-of-the-art on a broad set of transfer tasks and defining the pretraining paradigm that ViT inherited.
-
MLP-Mixer (NeurIPS 2021) — Co-authored the demonstration that competitive image classification can be achieved with architectures containing neither attention nor convolution, using only MLPs — a technically important proof of the generality of the patch-based paradigm.
-
Scaling Vision Transformers (CVPR 2022) — Demonstrated that ViTs scale reliably to 22 billion parameters, achieving 90.45% top-1 ImageNet accuracy — the largest and most accurate vision model at the time of publication.
-
SigLIP (ICCV 2023) — Replaced softmax contrastive loss in CLIP with a sigmoid formulation, enabling more scalable training and producing a family of open-source vision encoders that became the dominant plug-in backbone for vision-language model research.
-
PaliGemma (2024) — Co-led the development of a 3-billion-parameter open-weight vision-language model combining SigLIP and Gemma-2B, designed for broad fine-tuning transfer. Widely adopted for multimodal research and downstream applications.
-
ReaL ImageNet Labels / “Are we done with ImageNet?” (2020) — Produced corrected multi-label annotations for the ImageNet validation set, providing an honest evaluation protocol that exposed overstatements of progress in vision benchmarking.
-
big_vision codebase — Co-maintained the JAX-based research infrastructure underlying the Google Brain/DeepMind Zürich vision research program, released publicly and adopted by the broader research community.
-
Open-source hobby libraries — Created Go-Colorful (Go color manipulation), PyDenseCRF (Python wrapper for dense CRFs), libheatmap (high-performance C heatmap library used in at least four commercial products), and DeepFried2 (a Theano-based deep learning library), among over a dozen open-source tools across C++, Python, and Go.
Awards & Recognition
- ICLR 2021 Oral / Spotlight — Vision Transformer paper presented at ICLR 2021 with high recognition.
- AICES Doctoral Fellowship — RWTH’s fellowship for “extremely well-qualified students” during PhD.
- Bildungsfonds Scholarship — Awarded to most promising students during undergraduate studies.
- National Data Science Bowl — Top 10% — Finished in the top decile among 1,000+ participants in a Kaggle data science competition.
- Google Developer Group Aachen Hackathon Winner — Won with Alexander Hermans.
- 50+ publications at top-tier venues — CVPR, NeurIPS, ICCV, ICLR, ECCV and others; Google Scholar citation profile in the tens of thousands.
Key Relationships
- Xiaohua Zhai — The closest long-term research partner of Beyer’s career; co-lead of the multimodal team at Google Brain/DeepMind Zürich; co-founded OpenAI Zürich together; co-author on ViT, SigLIP, scaling ViT, and many other papers.
- Alexander Kolesnikov — Third member of the Google Brain Zürich trio who co-founded OpenAI Zürich; co-author on BiT, ViT, and other scaling papers.
- Neil Houlsby — Senior Google Brain researcher and close collaborator; co-author on ViT, MLP-Mixer, and other works; one of the founding members of the Zürich vision team.
- Alexey Dosovitskiy — Lead author of ViT; Beyer was a co-lead contributor on the paper and subsequent work; Dosovitskiy is a co-founder of Recursive.
- Bastian Leibe — PhD advisor at RWTH Aachen; head of the Visual Computing Institute and one of Germany’s leading computer vision researchers; provided the robotics perception environment in which Beyer’s research career began.
- Andreas Steiner — Consistent collaborator across big_vision, SigLIP, and scaling papers.
- Mostafa Dehghani — Co-author on scaling ViT and other large-scale vision papers.
Personal Style
Beyer describes himself as a “self-taught hacker and studied scientist” — a characterization that maps accurately onto his output. The hacker side manifests in a GitHub profile spanning over a dozen open-source libraries across multiple programming languages (C++, Go, Python, JavaScript), a personal website written with dry wit and occasional prompt-injection gags for AI systems trained on its content, and a career trajectory that started from wanting to make video games. The scientist side manifests in a continuous string of foundational vision papers and a serious interest in getting evaluation right — the “Are we done with ImageNet?” paper is characteristic of someone who cares about whether progress is real. He plays DOTA2 seriously enough that his website references it as a primary leisure activity and his personal bio specifies it, and he coached his university’s ice hockey team for two years. He has a child, and lives in Zürich, Switzerland, where he has been based since joining Google Brain. His public commentary on the Meta signing bonus reports — pouring cold water on $100 million figures — reflects a candor about industry dynamics that is unusual for a researcher of his seniority.
References
- Personal website: lucasb.eyer.be
- Google Scholar: scholar.google.com
- RWTH Aachen VCI profile: vision.rwth-aachen.de
- IQ.wiki profile: iq.wiki
- Fortune article (June 2025): fortune.com
- OpenAI Zürich launch coverage: Google Search
- big_vision repository: github.com/google-research/big_vision
- Digg profile: digg.com/u/x/giffmana