Imbue
Generally Intelligent
Jamie Simon, UC Berkeley: Theoretical principles for deep neural networks
0:00
-1:01:54

Jamie Simon, UC Berkeley: Theoretical principles for deep neural networks

Jamie Simon is a fourth-year physics Ph.D. student at UC Berkeley, advised by Mike DeWeese, and a Research Fellow with us at Imbue. He uses tools from theoretical physics to build a fundamental understanding of deep neural networks so they can be designed from first principles. In this episode, we discuss reverse engineering kernels, the conservation of learnability during training, infinite-width neural networks, and much more.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

“I do think that the deeper idea of reverse engineering kernels is powerful and probably holds across architectures. The central message isn’t really like: here’s the particular theory on fully-connected networks. The central message is: let’s think about the inductive bias of architectures in kernel space directly and see if we can do our design work in kernel space instead of in parameter space.”

“At first glance, the idea of an infinite-width neural network as a useful object of study sounds insane; and why should this be a reasonable limit to take? Like, why, if we want to understand a neural network which like obviously has to be finite to do anything useful, could we hope to learn anything by just making something infinite? Like that, especially is baffling from the viewpoint of classical statistics, where you, you hope to find a parsimonious model you wanna like wield Occam’s razor like a sword. So, it seems baffling at first that this should be useful, but it turns out actually a number of like, breakthrough results in the, especially, you know, around the early part of my PhD found that some really, like non-trivial, insightful behavior emerge when you take this infinite width limit.”

“In the case of infinite width: If the neural tangent kernel only has trivial alignment, like just chance alignment with the target function of the data it won’t generalize on it. But in practice, we see very good alignment between this kernel object and then the target function.”

“A question you could ask is, why do convolutional networks do better than fully connect networks on image data? Well, it turns out their kernels have better alignment with image data.”

“Although, people have shown interestingly that if you take the neural tangent kernel of a network after training then the real neural network after training looks a lot as if it had always had its final neural contingent kernel. So like you don’t have to worry so much about the evolution over time so much as where it ended up only.”

References

Thanks to Tessa Hall for editing the podcast.


About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating computing tools controlled by individuals.

Discussion about this episode

User's avatar

Ready for more?