ML researcher, former Associate Professor at Cornell Tech (2016–2026), co-founder of COLM, and author of the Annotated Transformer, GPU-Puzzles, and OpenNMT — one of the most prolific producers of open-source NLP tools and educational code in the field, now working on post-training for coding AI at Cursor.
Profile
| Field | Detail |
|---|---|
| Full Name | Alexander “Sasha” Rush |
| Nationality | American |
| Current Role | ML Researcher, Cursor |
| Former Roles | Associate Professor, Cornell Tech (2021–2026); Assistant Professor, Harvard SEAS (2016–2021); Researcher, Hugging Face (2019–2024) |
| Research Areas | Post-Training, Language Models, Text Generation, Efficient Inference, Controllable Generation, Structured Prediction, Educational ML Tools |
| PhD Advisor | Michael Collins (MIT) |
| Personal Website | rush-nlp.com |
| YouTube | @srush_nlp |
| X / Twitter | @srush_nlp |
| GitHub | @srush |
| Google Scholar | scholar.google.com |
Overview
Sasha Rush (Alexander M. Rush) is an American ML researcher known equally for technical research contributions and for building the educational and open-source infrastructure through which a generation of practitioners learned to work with transformers and language models. From 2016 to 2026 he was a faculty member at Harvard and Cornell Tech; during that decade he co-created the Annotated Transformer — probably the most-read PyTorch implementation of the original Transformer paper — built GPU-Puzzles and Tensor-Puzzles as interactive GPU programming curricula, co-led OpenNMT as one of the first production-quality open-source neural MT systems, and was a part-time researcher at Hugging Face where he contributed to early open-source LLM work. He co-founded COLM (Conference on Language Modeling) in 2024 and serves as its president. He left academia in 2026 to join Cursor as a researcher, focusing on post-training of AI systems for coding. His academic lineage includes PhD from MIT under Michael Collins and a postdoc at Facebook AI Research under Yann LeCun.
Education
Ph.D., Computer Science — MIT, 2014
Rush completed his doctorate at MIT under Michael Collins, one of the field’s leading researchers in statistical NLP and structured prediction. His dissertation work focused on probabilistic models and structured prediction for NLP tasks including parsing and machine translation. His research at MIT received a NAACL 2012 Best Paper Award (with Slav Petrov, for work on vine pruning for efficient dependency parsing) and several honorable mentions at major NLP venues.
Postdoctoral Fellow — Facebook AI Research (FAIR), New York, 2014–2016
Rush joined Facebook AI Research under Yann LeCun as a postdoctoral fellow. This period coincided with the early wave of deep learning applications to NLP, and produced some of his most-cited early work including the 2015 neural attention summarization paper that helped establish sequence-to-sequence learning as a tool for text generation beyond translation.
Career
Harvard School of Engineering and Applied Sciences (2016–2021)
Rush joined Harvard SEAS as an assistant professor in 2016, founding the HarvardNLP group (harvardnlp.github.io). His research at Harvard spanned neural text generation, structured attention, visualization of neural networks, and the early transformer era. Key outputs from this period include the OpenNMT open-source translation toolkit (2017), the Annotated Transformer educational resource (2018), and LSTMVis (2017), a visualization tool for analyzing hidden states in recurrent networks that won a Best Paper at the IEEE InfoVis conference.
During his Harvard years Rush developed a distinctive approach to research communication — writing heavily commented, executable implementations alongside technical papers — that became a signature of his public output.
Cornell Tech (2021–2026)
Rush moved to Cornell Tech in New York City as an associate professor, where he was affiliated with the Cornell Ann S. Bowers College of Computing and Information Science and the Cornell NLP Group. He received the Cornell Tech Student Choice Award for excellence in teaching. His research focus shifted increasingly toward efficient and generative language modeling, including work on pretraining without attention (BiGS, EMNLP 2023), diffusion language models (NeurIPS 2024), and contextual document embeddings (ICLR 2025). He continued publishing GPU-facing educational tools, including GPU-Puzzles and Tensor-Puzzles.
Hugging Face — Researcher (2019–2024)
Alongside his faculty role, Rush worked part-time as a researcher at Hugging Face from approximately 2019 to 2024. He contributed to multiple early Hugging Face projects: he was an author on the original Transformers system paper (Wolf et al., EMNLP Demos 2020), contributed to the BigScience project and the T0 multitask prompted model (ICLR 2022) through PromptSource, and co-authored Zephyr (COLM 2024) — a lightweight instruction-tuned model produced through direct distillation of LM alignment that became widely used as an open reference model. This dual appointment made him one of the most effective bridges between academic NLP research and the open-source Hugging Face ecosystem.
COLM — Co-Founder and President (2024–present)
In 2024, Rush co-founded and launched the Conference on Language Modeling (COLM), a venue dedicated specifically to language model research — the first major conference focused exclusively on the area that had come to dominate NLP. He serves as president of the conference. COLM filled a gap in the conference landscape where LM-specific work had been distributed across venues with broader scope (NeurIPS, ICML, ICLR, ACL) and lacked a dedicated home. The inaugural COLM published Zephyr as one of its first papers.
Rush also served as Secretary and General Chair of ICLR during his academic period, developing the software infrastructure that ran virtual conference operations during the COVID-19 period (2020–2021).
Cursor — Researcher (2026–present)
In 2026, Rush left academia to join Cursor, an AI-native code editor and development tool. His personal website describes his current focus as “post-training of AI systems for coding and related tasks” and improving model reasoning for long-horizon coding problems. He co-authored “Composer 2” (arXiv 2026) with the Cursor team — a paper on large-context code generation. He noted on his site: “From 2016–2026, I was a Professor at Harvard and then Cornell.”
Key Contributions
-
The Annotated Transformer (ACL NLP-OSS workshop, 2018) — A heavily commented, executable PyTorch implementation of “Attention Is All You Need” that walks through every component of the Transformer architecture with inline code and visualizations. Probably the most-read implementation and tutorial of the Transformer ever written, and the entry point through which a very large fraction of the ML community first understood the architecture in code. Updated and maintained on GitHub.
-
GPU-Puzzles (GitHub, 2021) — A collection of 14 interactive CUDA puzzles implemented using Numba, designed to teach GPU programming from first principles. One of the most starred educational ML repositories on GitHub; used in courses worldwide to teach parallel programming for deep learning. A companion to Tensor-Puzzles, which does the same for tensor operations in Python.
-
OpenNMT (ACL Demo, 2017) — Co-developed with Guillaume Klein, Yoon Kim, and Jean Senellart; one of the first production-quality open-source neural machine translation systems, released in PyTorch and used both in research and in deployment. Became a reference implementation for seq2seq models and influenced the design of subsequent NLP frameworks.
-
A Neural Attention Model for Abstractive Sentence Summarization (EMNLP 2015) — With Sumit Chopra and Jason Weston; one of the first papers to apply neural attention to abstractive summarization, demonstrating that the encoder-decoder attention mechanism could generate novel summary text rather than merely extracting spans. Helped establish neural abstractive summarization as a research direction.
-
Sequence-Level Knowledge Distillation (EMNLP 2016) — With Yoon Kim; introduced the idea of distilling a sequence-to-sequence teacher model into a smaller student model at the sequence level rather than at the token level, a technique that became widely used for model compression in NLP.
-
Zephyr: Direct Distillation of LM Alignment (COLM 2024) — Co-authored with the Hugging Face research team; demonstrated that distillation from a stronger model’s feedback could align a lightweight open model (7B parameters) to instruction following at a level competitive with much larger models, using a simpler training pipeline than full RLHF. Became a widely adopted open reference model for instruction following research.
-
T0 / PromptSource — Multitask Prompted Training (ICLR 2022) — Co-authored with Victor Sanh and others from BigScience; showed that pretraining on a diverse collection of human-authored prompts enabled zero-shot generalization to unseen tasks without in-context examples.
-
LSTMVis (IEEE InfoVis 2017) — With Hendrik Strobelt, Sebastian Gehrmann, and Hanspeter Pfister; a visualization tool for analyzing hidden state dynamics in recurrent networks, awarded Best Paper at InfoVis.
-
The Annotated S4 — Rush’s interactive, executable implementation of the Structured State Space Sequence (S4) model, continuing his tradition of producing heavily annotated implementations of key architectural papers that serve as community educational resources.
-
YouTube channel (@srush_nlp) — A series of technical lectures and courses covering language model internals, GPU programming, and deep learning systems; one of the most-watched practitioner-facing technical video resources in the ML community.
-
NAACL 2012 Best Paper — “Vine Pruning for Efficient Multi-Pass Dependency Parsing” (with Slav Petrov); among the first of multiple best paper awards at NLP venues.
Awards & Recognition
- Sloan Research Fellowship (c. 2018)
- NSF CAREER Award
- Presidential Early Career Award for Scientists and Engineers (PECASE)
- Cornell Tech Student Choice Award for Excellence in Teaching
- Best Paper Awards at NAACL (2012), InfoVis (2017), and hardware-focused venues
- ICLR Secretary and General Chair — Institutional leadership of one of the field’s flagship conferences; developed virtual conference infrastructure.
- COLM Co-founder and President (2024–present)
Key Relationships
- Michael Collins — PhD advisor at MIT; one of the most influential NLP researchers of the 2000s-2010s in structured prediction and parsing; Rush’s work on optimal decoding and efficient parsing reflects Collins’s rigorous probabilistic NLP tradition.
- Yann LeCun — Postdoc supervisor at FAIR; the deep learning orientation of Rush’s subsequent work on neural text generation was shaped by the FAIR environment.
- Yoon Kim — Long-term collaborator from Harvard; co-authored Character-Aware Neural Language Models, Sequence-Level Knowledge Distillation, and Compound PCFG with Rush; one of the most productive bilateral research relationships in NLP.
- Thomas Wolf — Hugging Face collaborator; co-authored the Transformers system paper and Zephyr; their shared time at Hugging Face overlapped with the key years of open LLM development.
- Albert Gu — Collaborator on pretraining without attention (BiGS) and related state space model work; Gu’s S4 architecture was the subject of Rush’s Annotated S4 tutorial.
- Stuart Shieber — Harvard colleague and collaborator on template-based text generation; also a touchstone for Rush’s interest in literate programming and clearly documented research.
Personal Style
Rush’s public persona is built almost entirely around a single conviction: that making complex technical ideas maximally clear and executable is not just pedagogy but a form of research contribution in its own right. The Annotated Transformer, GPU-Puzzles, Tensor-Puzzles, the Annotated S4, and the YouTube channel all enact a philosophy of “literate programming” — code and explanation interleaved so that understanding the algorithm means being able to run it, not just read about it. He has cited Ken Shan and other literate programming advocates as influences, and has described his interest in this mode of communication as predating his NLP career. His Digg profile vibe (31.6% “Informing,” 21.7% “Teaching,” 12.6% “Announcing”) and the description “tweets and blogs, mostly about coding and ML” capture a communicator primarily interested in building shared understanding rather than staking positions. His move from academia to Cursor in 2026 is consistent with a career orientation that was never quite purely academic — the open-source infrastructure work, the Hugging Face appointment, and the COLM founding all point toward someone who values building functional, widely-used things over accumulating scholarly output.
References
- Personal website — rush-nlp.com
- X / Twitter — @srush_nlp
- GitHub — @srush
- YouTube — @srush_nlp
- Digg profile
- Google Scholar
- Cornell Tech profile
- Simons Institute profile (2024-2025)
- The Annotated Transformer — nlp.seas.harvard.edu
- GPU-Puzzles — GitHub
- OpenNMT — opennmt.net
- COLM — colmweb.org
- CS224U podcast with Chris Potts (2022)