Deep Reinforcement Learning for Natural but Adversarial Behavior

Stacey Svetlichnaya

How robust are deep RL agents trained with self-play?

Adversarial examples are a known problem in image classification. Deep reinforcement learning policies are similarly vulnerable to adversarial manipulation of their observations. In general, an attacker cannot explicitly modify another agent's observations, but in a shared multi-agent environment one might be able to choose actions specifically to create observations (in the other agent(s)) that are reasonable/natural but adversarial. This is precisely what the Adversarial Policies project by Adam Gleave et al proves by construction in simulated zero-sum games between two humanoid robots with basic proprioception (e.g. two wrestlers, a kicker and a goalie, based on MuJoCo environments).

Read full post →

Join our mailing list to get the latest machine learning updates.