Evolved Policy Gradients

In Openai IA, Research

We’re releasing an experimental metalearning approach called Evolved Policy Gradients, a method that evolves the loss function of learning agents, which can enable fast training on novel tasks. Agents trained with EPG can succeed at basic tasks at test time that were outside their training regime, like learning to navigate to an object on a different side of the room from where it was placed during training.

Gotta Learn Fast: A new benchmark for generalization in RL

AI safety via debate

Related Posts

Questions for the Record

AMD and OpenAI announce strategic partnership to deploy 6 gigawatts of AMD GPUs

Harness engineering: leveraging Codex in an agent-first world

Early methods for studying affective use and emotional well-being on ChatGPT