
Caglar Gulcehre
Staff Research Scientist, DeepMind
Google Scholar: Click here!
Twitter: caglarml@
Github: github.com/caglar ***Not up to date!***
Email: ca9lar At Gmail
Location: London, UK

Bio
I am a staff research scientist at DeepMind working on the intersection of Reinforcement Learning, Deep Learning, Representation Learning, and Natural Language Understanding.
I am interested in building agents that can learn from a feedback signal (often, weak, sparse and noisy in the real-world) while utilizing unlabeled data available in the environment. I am both interested in improving our understanding of the existing algorithms and developing new ones to enable real-world applications with positive social impact. I am, in particular, fascinated by the scientific applications of machine learning algorithms. I enjoy working multi/cross-disciplinary and am often inspired by neuroscience, biology, and cognitive sciences when working on algorithmic solutions.
I finished my Ph.D. under the supervision of Yoshua Bengio at MILA.
I defended my thesis "Learning and time: on using memory and curricula for language understanding" in 2018 with Christopher Manning as my external examiner. Currently, the research topics that I am working on include but not limited to reinforcement learning, offline RL, large-scale deep architectures (or foundational models. as they call it these days), and representation learning (including self-supervised learning, new architectures, causal representations, etc.) I have served as an area chair and reviewer to major machine learning conferences such as ICML, NeurIPS, ICLR, and journals like Nature and JMLR. I have published at numerous influential conferences and journals such as Nature, JMLR, NeurIPS, ICML, ICLR, ACL, EMNLP, etc... My work has received the best paper award at the Nonconvex Optimization workshop at NeurIPS and an honorable mention for best paper at ICML 2019.
I have co-organized the Science and Engineering of Deep Learning workshops at NeurIPS and ICLR.
Recent Updates
-
Our paper "On integrating a language model into neural machine translation" got the best research paper award at Interspeech 2022.
-
Our paper "An Empirical Study of Implicit Regularization in Deep Offline RL" is on arXiv.
-
We are organizing the ML Evaluation Standards workshop at ICLR 2022.
-
We presented our paper "StarCraft II Unplugged: Large Scale Offline Reinforcement Learning" at the Deep RL workshop at NeurIPS 2021.
-
Our paper Active Offline Policy Selection is accepted to NeurIPS 2021.
-
I have presented Intro to RL (part 1 slides) and Offline RL lectures (part 2 slides) at DeepLearn 2021 Summer School.
-
We have released DeepMind Lab and Bsuite datasets for Offline RL Under RL Unplugged.
-
Our paper On Instrumental Variable Regression for Deep Offline Policy Evaluation is on arXiv.
-
Our paper Regularized behavior value estimation on a single step policy improvement method is on arXiv.
-
Our paper Addressing Extrapolation Error in Deep Offline Reinforcement Learning got Oral at Offline RL Workshop at NeurIPS 2020.
-
We released the hard-eight task suite that was used in the "Making Efficient Use of Demonstrations" paper.
Selected Publication
Regularized Behavior Value Estimation

Authors
Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas
Abstract
Offline reinforcement learning restricts the learning process to rely only on logged-data without access to an environment. While this enables real-world applications, it also poses unique challenges. One important challenge is dealing with errors caused by the overestimation of values for state-action pairs not well-covered by the training data. Due to bootstrapping, these errors get amplified during training and can lead to divergence, thereby crippling learning. To overcome this challenge, we introduce Regularized Behavior Value Estimation (R-BVE). Unlike most approaches, which use policy improvement during training, R-BVE estimates the value of the behavior policy during training and only performs policy improvement at deployment time. Further, R-BVE uses a ranking regularisation term that favours actions in the dataset that lead to successful outcomes. We provide ample empirical evidence of R-BVE's effectiveness, including state-of-the-art performance on the RL Unplugged ATARI dataset. We also test R-BVE on new datasets, from bsuite and a challenging DeepMind Lab task, and show that R-BVE outperforms other state-of-the-art discrete control offline RL methods.
Work Experience


DeepMind (2017-)
Research Scientist
MSR (2016)
Part-time researcher

IBM Research (2015-2016)
Research Intern

DeepMind (2014)
Research Intern

Maluuba (2015)
Part-time researcher

Tubitak (2008-2011)
Researcher

MILA (2012-2017)
PhD and Research Assistant

METU (2008-2010)
Developer