Selected Papers and Projects
Acme: A research framework for distributed reinforcement learning
We introduce a new RL framework that is flexible to implement offline, online, off-policy and on-policy RL algorithms. We also release several RL algorithms with that framework.
Making efficient use of demonstrations to solve hard exploration problems
We propose a novel imitation learning algorithm called R2D3 that can learn to solve sparse reward and hard-exploration problems with sparse rewards.
On Instrumental Variable Regression for Deep Offline Policy Evaluation
We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being correlated. Hence, direct minimization of the Bellman error can result in significantly biased Q-function estimates.
Regularized Behavior Value Estimation
We overcome the challenges of Offline RL, we introduce Regularized Behavior Value Estimation (R-BVE). Unlike most approaches, which use policy improvement during training, R-BVE estimates the value of the behavior policy during training and only performs policy improvement at deployment time