Imbue
Generally Intelligent
Ben Eysenbach, CMU: Designing simpler, more principled RL algorithms
0:00
-1:45:56

Ben Eysenbach, CMU: Designing simpler, more principled RL algorithms

Ben Eysenbach is a Ph.D. student at Carnegie Mellon University and a student researcher at Google Brain. He is co-advised by Sergey Levine and Ruslan Salakhutdinov. His research focuses on developing RL algorithms that get state-of-the-art performance while being simpler, scalable, and robust. Recent problems he's tackled include long-horizon reasoning, exploration, and representation learning. In this episode, we discuss designing more principled RL algorithms and much more.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

“If we see all the states we’ve seen so far and look at the representations, let’s imagine that those representations have a length of one, so we can think about them as points on a sphere. Then, after we put each of these points on the sphere, we can turn the sphere around and say, okay, where are most of the points, and where are we missing points? And say, you’re missing points down near Antarctica. And then we can say, okay, let’s try to get down to Antarctica. And then we could, because we’re learning a goal condition policy, we say, okay, try to get here or try to get to a state that has this representation.”

“One thing that I’m really excited about is thinking about how we can leverage this idea of connecting contrastive learning to reinforcement learning to make use of advances in contrastive learning in other domains like NLP and computer vision. In NLP, we’ve seen really great uses of contrastive learning for things like CLIP that can connect image ideas with language using contrastive learning. And in our contrastive project, we saw how we can connect the states and the actions to the future states. As you might imagine that maybe there’s a way of plugging these components together, and indeed, you can feel that mathematically there is. And so one thing I’m really excited in exploring is saying, well, ‘can we use this to specify tasks?’ Not in terms of images of what you would want to happen, but rather language descriptions.”

“One of the reasons why I’m particularly excited about these problems is that these language models, they’re trained to maximize the likelihood of the next token. That draws a really strong connection to this way of treating reinforcement learning problems as predicting probabilities and as maximizing probabilities. And so I think that these tools are actually much, much more similar than they might seem on the surface.”

“I don’t know how controversial it is, but I would like to see more effort on taking even existing methods and applying them to new tasks, to real problems. I think part of this will require a shift in how we evaluate papers—evaluating them not so much on algorithmic novelty rather than on ‘did you actually solve some interesting problem?‘”

References

Thanks to Tessa Hall for editing the podcast.


About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating computing tools controlled by individuals.

Discussion about this episode

User's avatar

Ready for more?