Japan-born Chinese Canadian researcher whose career spans Bayesian inference, sample-efficient deep reinforcement learning, robot learning, and LLM reasoning — most publicly recognized as co-author of the “Let’s think step by step” zero-shot chain-of-thought paper.
Profile
| Born | Japan (date not publicly disclosed) |
| Nationality | Chinese Canadian (born in Japan) |
| Current Institution(s) | Google DeepMind (Senior Staff Research Scientist, Gemini Thinking team) |
| Research Areas | Reinforcement Learning, Deep Learning, Robotics, Probabilistic Machine Learning, Large Language Models, Reasoning |
| Doctoral Advisor | Richard E. Turner; Zoubin Ghahramani; Bernhard Schölkopf |
| Doctoral Thesis | Machine Learning PhD — University of Cambridge & Max Planck Institute for Intelligent Systems (Cambridge-Tübingen Fellowship) |
| Website | sites.google.com/view/gugurus |
| X / Twitter | @shaneguML (English); @shanegJP (Japanese) |
| Google Scholar | Shixiang Shane Gu — 72,000+ citations |
Overview
Shixiang Shane Gu is a Senior Staff Research Scientist at Google DeepMind, currently working on the Gemini Thinking team. A Japan-born Chinese Canadian fluent in English, Japanese, and Mandarin, his research career spans an unusually broad range of subfields: Bayesian inference and probabilistic machine learning during his PhD, sample-efficient deep reinforcement learning at Google Brain, large-scale robotic learning as a founding member of the Google Brain Robotics team, LLM alignment and reasoning during stints on the ChatGPT team at OpenAI and as a contributor to GPT-4, and multilingual post-training and chain-of-thought reasoning at Google DeepMind. He is best known to the general AI community as a co-author of “Large Language Models are Zero-Shot Reasoners” (NeurIPS 2022), the paper that introduced the prompt “Let’s think step by step” and demonstrated that zero-shot chain-of-thought reasoning is an emergent capability of large language models. He has also co-invented Gumbel-Softmax — now a standard differentiable technique for discrete latent variables — and contributed to Gumbel-Softmax’s, Q-Prop’s, and related algorithms that shaped the sample-efficiency agenda in deep RL. He has held visiting positions at the University of Tokyo (Visiting Associate Professor) and Stanford University, and led OpenAI’s Japan market entry.
Early Life & Education
Gu was born in Japan to a Chinese family and later settled in Canada, identifying as Japanese-born Chinese Canadian. He pursued undergraduate studies in Engineering Science at the University of Toronto, where his supervisor was Geoffrey E. Hinton — one of the most consequential mentorships available in machine learning at the time. The Hinton group at Toronto was the crucible for much of the deep learning revolution, and Gu’s formation there placed him at the center of that network from the start of his research career.
For his doctoral work, Gu received a Cambridge-Tübingen PhD Fellowship — a competitive joint program between the University of Cambridge Machine Learning Group and the Max Planck Institute for Intelligent Systems — and completed a PhD in machine learning under the supervision of Richard E. Turner and Zoubin Ghahramani at Cambridge and Bernhard Schölkopf at MPI. His doctoral research focused on probabilistic machine learning and Bayesian inference, including contributions to Neural Adaptive Sequential Monte Carlo and gradient estimators for stochastic networks. He also received an NSERC (Natural Sciences and Engineering Research Council of Canada) Scholarship during this period.
Career
Google Brain — Research Scientist, Founding Member of Google Brain Robotics (c. 2016–2022)
Gu joined Google Brain as a research scientist, co-located between the US and Japan, and became a founding member of the Google Brain Robotics team — one of the first dedicated industrial research groups applying deep learning to physical robot control. During this period he pursued two parallel research threads that each had lasting impact.
Sample-efficient deep reinforcement learning. Gu’s RL work addressed what he identified as the core bottleneck for real-world robotics: the prohibitive sample complexity of deep RL algorithms. Working primarily with Timothy Lillicrap, Sergey Levine, Zoubin Ghahramani, and Richard Turner, he developed a sequence of increasingly effective algorithms. NAF (Normalized Advantage Functions, ICML 2016) introduced a continuous Q-learning approach using quadratic advantage representations, allowing off-policy training in continuous action spaces. Q-Prop (ICLR 2017 Oral) combined on-policy stability with off-policy efficiency by using a Taylor expansion of an off-policy critic as a control variate — a technique that improved sample efficiency substantially over TRPO and DDPG on MuJoCo benchmarks. IPG (Interpolated Policy Gradient, NIPS 2017) generalized the on/off-policy spectrum theoretically and empirically. The asynchronous multi-robot deep RL work (ICRA 2017) demonstrated that parallelizing off-policy learning across multiple physical robots could achieve complex manipulation skills — a result featured in MIT Technology Review and a Google Research Blog post, and among the first demonstrations of deep RL training directly on real robotic hardware at scale.
Gumbel-Softmax (ICLR 2017). Alongside RL, Gu co-invented the Gumbel-Softmax reparameterization (with Eric Jang and Ben Poole), published concurrently with a closely related method (Concrete Distribution) by Maddison et al. The technique provides a differentiable approximation to categorical sampling, enabling backpropagation through discrete latent variables. It became immediately and durably standard in variational autoencoders, discrete generative models, and neural architecture search.
Zero-Shot Chain-of-Thought (“Let’s think step by step,” NeurIPS 2022). During his Google Brain period, Gu co-authored the paper “Large Language Models are Zero-Shot Reasoners” with Takeshi Kojima, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. The paper demonstrated that appending a single universal prompt — “Let’s think step by step” — before an LLM’s answer elicits multi-step reasoning without any task-specific examples or fine-tuning. On GSM8K, the technique improved PaLM 540B accuracy from 17.9% to 58.1% in zero-shot setting; on MultiArith, InstructGPT improved from 17.7% to 78.7%. The paper became one of the most-cited works in the LLM prompting literature and is widely regarded as a foundational contribution to the chain-of-thought research program.
OpenAI — Senior Researcher, ChatGPT Team; Japan Market Entry Lead (c. 2022–2023)
Gu spent a period at OpenAI as a senior researcher on the ChatGPT team and as a contributor to the GPT-4 technical report. He also led OpenAI’s Japan market entry initiative — the strategic effort to establish OpenAI’s commercial and research presence in Japan, which culminated in OpenAI’s Tokyo office opening in April 2024. His trilingual capability and Japan-based professional relationships made him a natural lead for this effort.
Google DeepMind — Senior Staff Research Scientist (2023–present)
Following the merger of Google Brain and DeepMind into Google DeepMind in 2023, Gu rejoined the combined organization. He led the Multilinguality team for Gemini Post-Training — the team responsible for extending Gemini’s language capabilities beyond English across the post-training pipeline — before moving to his current role on the Gemini Thinking team, which focuses on reasoning capabilities of frontier models.
University of Tokyo — Visiting Associate Professor (ongoing)
In parallel with his industry roles, Gu has held a Visiting Associate Professor (adjunct) position at the University of Tokyo, maintaining research collaborations with the Matsuo Lab and affiliated groups. Several of his NeurIPS and ICLR papers during 2021–2023 were co-authored with University of Tokyo PhD students and researchers.
Stanford University — Visiting Scholar
Gu has also held a visiting scholar position at Stanford University’s Department of Computer Science, contributing to a productive interface between his industry research and the academic community.
Key Contributions
-
Gumbel-Softmax (ICLR 2017) — “Categorical Reparameterization with Gumbel-Softmax,” with Eric Jang and Ben Poole. Provided a differentiable, reparameterizable approximation to discrete categorical distributions, enabling end-to-end gradient-based training through categorical latent variables. Now standard in variational inference, discrete generative models, and differentiable architecture search.
-
Q-Prop (ICLR 2017 Oral) — “Sample-Efficient Policy Gradient with An Off-Policy Critic.” Unified on-policy and off-policy policy gradient estimation using a control variate derived from an off-policy critic, substantially improving sample efficiency over TRPO and DDPG with theoretical guarantees on introduced bias.
-
Asynchronous Multi-Robot Deep RL (ICRA 2017) — “Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates.” Demonstrated that off-policy deep RL can train directly on real physical robots by parallelizing updates across multiple machines — one of the first practical demonstrations of deep RL on real robotic hardware without demonstrations or hand-designed rewards. Featured in MIT Technology Review.
-
Zero-Shot Chain-of-Thought / “Let’s think step by step” (NeurIPS 2022) — “Large Language Models are Zero-Shot Reasoners,” with Takeshi Kojima, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa. Showed that a single universal zero-shot prompt elicits accurate multi-step reasoning in LLMs across diverse arithmetic, symbolic, and commonsense reasoning benchmarks. One of the most-cited papers in the LLM reasoning literature; the phrase “Let’s think step by step” became a widely reproduced artifact of AI culture.
-
NAF (ICML 2016) — “Continuous Deep Q-Learning with Model-Based Acceleration.” Introduced the Normalized Advantage Function representation for continuous action Q-learning, enabling stable off-policy training in continuous control tasks without actor-critic architectures.
-
IPG (NIPS 2017) — “Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning.” Provided a theoretical unification of on/off-policy policy gradient methods and empirical improvement across continuous control benchmarks.
-
Contributions to GPT-4 and Gemini — Listed as a contributor to the GPT-4 Technical Report (OpenAI, 2023) and to both the Gemini and Gemini 1.5 model papers (Google DeepMind, 2023–2024), reflecting his participation in two of the most significant frontier model development efforts.
Awards & Recognition
- Best Paper Award, CoRL 2019 — For “A Divergence Minimization Perspective on Imitation Learning Methods” (with Seyed Kamyar Seyed Ghasemipour and Richard Zemel), awarded to the top 0.25% of submissions at the Conference on Robot Learning.
- Google Focused Research Award — Awarded during his Google Brain period.
- Cambridge-Tübingen PhD Fellowship — Competitive joint fellowship between the University of Cambridge and the Max Planck Institute for Intelligent Systems.
- NSERC Scholarship — Canadian national doctoral scholarship for outstanding graduate research.
- MIT Technology Review feature — His multi-robot deep RL work was highlighted in MIT Technology Review as demonstrating a novel approach to robotic skill acquisition.
Key Relationships
- Geoffrey E. Hinton — Undergraduate supervisor at the University of Toronto; the intellectual lineage connecting Gu to the deep learning tradition runs directly through Hinton’s group, which produced Sutskever, Krizhevsky, and many others.
- Richard E. Turner — Primary PhD supervisor at the University of Cambridge Machine Learning Group; shaped Gu’s probabilistic and Bayesian foundations.
- Zoubin Ghahramani — PhD co-supervisor at Cambridge; a leading figure in Bayesian machine learning whose probabilistic perspective influenced Gu’s early research.
- Bernhard Schölkopf — PhD co-supervisor at MPI Tübingen; kernel methods and causal inference pioneer.
- Sergey Levine — Principal collaborator at Google Brain on deep RL and robotics; multiple co-authored papers including Q-Prop, IPG, and several RL robotics works.
- Timothy Lillicrap — Google DeepMind researcher and key co-author on the NAF, Q-Prop, and IPG papers.
- Eric Jang — Google Brain co-author of Gumbel-Softmax; Jang’s simultaneous submission (independently developed) was published jointly at ICLR 2017 with Gu’s version.
- Yutaka Matsuo — University of Tokyo professor; the research bridge enabling Gu’s collaborations with Tokyo-based PhD students on the Zero-Shot CoT and related LLM papers.
- Takeshi Kojima — University of Tokyo PhD student and first author of the “Let’s think step by step” paper; a representative of the Japan–Google Brain research collaboration.
Personal Style
Gu occupies a rare position in the AI research landscape as someone who has made substantive contributions in at least three distinct technical paradigms — Bayesian inference, deep RL, and LLM reasoning — rather than deepening a single specialty. His research arc maps closely to the generational transitions of the field itself: from probabilistic machine learning (2014–2016) to the deep RL and robotics era (2016–2020) to the LLM era (2020–present), with productive output at each transition. He maintains separate Twitter accounts for English and Japanese professional communities, a practice that reflects his genuine trilingual, tricultural identity — Japanese-born, Canadian-educated, working across the US, Japan, and UK. His collaborations span academic groups in Tokyo and his participation in OpenAI’s Japan market entry suggest a consistent interest in bridging the East Asian and Western AI ecosystems. He is formally active in the academic community as Action Editor for TMLR and as Area Chair for NeurIPS and ICML.
References
- Personal website: sites.google.com/view/gugurus
- Google Scholar: scholar.google.com
- Cambridge MLG profile: mlg.eng.cam.ac.uk
- University of Tokyo TRAIL lab: trail.t.u-tokyo.ac.jp
- Stanford US-ATMC seminar bio: ee.stanford.edu
- Creative Destruction Lab mentor profile: creativedestructionlab.com
- Digg profile: digg.com/u/x/shaneguml