Selected Papers and Projects
Acme: A research framework for distributed reinforcement learning
We introduce a new RL framework that is flexible to implement offline, online, off-policy and on-policy RL algorithms. We also release several RL algorithms with that framework.
Active Offline Policy Selection
This paper addresses the problem of policy selection in domains with abundant logged data, but with a very restricted interaction budget.
Critic regularized regression
We proposes a novel offline RL approach that achieves SOTA results and based on the principle of selective imitation.
Grandmaster level in StarCraft II using multi-agent reinforcement learning
We develop a SOTA RL agent for Starcraft II using multi-agent RL, self-play and imitation learning.
Improving the gating mechanism of recurrent neural networks
We propose a novel gating mechanism that makes the training of gated RNNs much easier.
Making efficient use of demonstrations to solve hard exploration problems
We propose a novel imitation learning algorithm called R2D3 that can learn to solve sparse reward and hard-exploration problems with sparse rewards.
Offline learning from demonstrations and unlabeled experience
We designed an offline RL algorithm that can learn from demonstrations without any rewards.
On Instrumental Variable Regression for Deep Offline Policy Evaluation
We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being correlated. Hence, direct minimization of the Bellman error can result in significantly biased Q-function estimates.
Regularized Behavior Value Estimation
We overcome the challenges of Offline RL, we introduce Regularized Behavior Value Estimation (R-BVE). Unlike most approaches, which use policy improvement during training, R-BVE estimates the value of the behavior policy during training and only performs policy improvement at deployment time
Social influence as intrinsic motivation for multi-agent deep reinforcement learning
We propose a novel co-operative Multi-agent RL algorithm that is stable.
Stabilizing transformers for reinforcement learning
We propose GrTXL that introduces gating mechanism which stabilizes the learning of transformers for RL.