Selected Papers and Projects

Acme: A research framework for distributed reinforcement learning

We introduce a new RL framework that is flexible to implement offline, online, off-policy and on-policy RL algorithms. We also release several RL algorithms with that framework.

Active Offline Policy Selection

This paper addresses the problem of policy selection in domains with abundant logged data, but with a very restricted interaction budget.

Critic regularized regression

We proposes a novel offline RL approach that achieves SOTA results and based on the principle of selective imitation.

Grandmaster level in StarCraft II using multi-agent reinforcement learning

We develop a SOTA RL agent for Starcraft II using multi-agent RL, self-play and imitation learning.

Improving the gating mechanism of recurrent neural networks

We propose a novel gating mechanism that makes the training of gated RNNs much easier.

Making efficient use of demonstrations to solve hard exploration problems

We propose a novel imitation learning algorithm called R2D3 that can learn to solve sparse reward and hard-exploration problems with sparse rewards.

Offline learning from demonstrations and unlabeled experience

We designed an offline RL algorithm that can learn from demonstrations without any rewards.

On Instrumental Variable Regression for Deep Offline Policy Evaluation

We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being correlated. Hence, direct minimization of the Bellman error can result in significantly biased Q-function estimates.

Regularized Behavior Value Estimation

We overcome the challenges of Offline RL, we introduce Regularized Behavior Value Estimation (R-BVE). Unlike most approaches, which use policy improvement during training, R-BVE estimates the value of the behavior policy during training and only performs policy improvement at deployment time

Social influence as intrinsic motivation for multi-agent deep reinforcement learning

We propose a novel co-operative Multi-agent RL algorithm that is stable.

Stabilizing transformers for reinforcement learning

We propose GrTXL that introduces gating mechanism which stabilizes the learning of transformers for RL.