Martín Arjovsky (Google Scholar) did his Ph.D. at NYU with Leon Bottou. Some of his well-known works include the Wasserstein GAN and a paradigm called Invariant Risk Minimization. In this episode, we discuss out-of-distribution generalization, geometric information theory, and the importance of good benchmarks.
Highlights
“Now everyone is starting to solve robustness without having a benchmark that shows that robustness is a problem… There’s many, many anecdotal reports of these problems - and on deployed systems, things that really daily affect people! […] I would just like to have benchmark, things where I can test algorithms on this.”
“It’s this very counter-intuitive problem where throwing away data points is a form of regularization. […] You throw away things you already know.”
“Information theory for machine learning is mostly useless in its current form. Let me elaborate on this. I do think that it can be fixed, though, and would be totally fine spending five years of my life to do that. […] All of these information theoretic tools that they use, like for example entropy, really make sense only in discrete spaces. Continuous entropy does not make sense in most cases.”
“I’m tired of seeing papers that have an introduction about A, an experiment about B, and a theory about C.”
References
Yoshua Bengio, who introduced Martin to some interesting papers:
Neural Machine Translation by Jointly Learning to Align and Translate by Bahdanau et al. (the attention paper)
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization by Dauphin et al.
Martin reached out to Yann Dauphin with his 1.5 page LaTeX document
Martin’s paper Unitary Evolution Recurrent Neural Networks
The DCGAN paper by Radford et al.
Martin’s WGAN paper
Causal inference by using invariant prediction: identification and confidence intervals by Peters et al. (published in Royal Statistical Society)
Shiori Sagawa at Stanford
GAIT: A Geometric Approach to Information Theory by Gallego et al.
Geometrical Insights for Implicit Generative Modeling by Bottou et al.
Understanding the Failure Modes of Out-of-Distribution Generalization by Nagarajan et al. (Martin’s favorite paper of the last two years!)
Uniform convergence may be unable to explain generalization in deep learning by Nagarajan & Kolter
An investigation of why overparameterization exacerbates spurious correlations by Sagawa et al.
Just Train Twice: Improving Group Robustness without Training Group Information by Liu et al.
Alison Gopnik and her work studying how children learn
Thanks to Tessa Hall for editing the podcast.












