<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Imbue]]></title><description><![CDATA[Ideas on the future we want, and how to build the technologies and systems that get us there.]]></description><link>https://ideas.imbue.com</link><image><url>https://substackcdn.com/image/fetch/$s_!GuGx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F539d4a69-f426-48c8-bf7a-cce94fd735bc_800x800.png</url><title>Imbue</title><link>https://ideas.imbue.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 20 Apr 2026 13:22:20 GMT</lastBuildDate><atom:link href="https://ideas.imbue.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Imbue]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[imbueai@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[imbueai@substack.com]]></itunes:email><itunes:name><![CDATA[Imbue]]></itunes:name></itunes:owner><itunes:author><![CDATA[Imbue]]></itunes:author><googleplay:owner><![CDATA[imbueai@substack.com]]></googleplay:owner><googleplay:email><![CDATA[imbueai@substack.com]]></googleplay:email><googleplay:author><![CDATA[Imbue]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Attention as an art form]]></title><description><![CDATA[Watch now | Art of Being Human #2 with Adam Robbert]]></description><link>https://ideas.imbue.com/p/attention-as-an-art-form</link><guid isPermaLink="false">https://ideas.imbue.com/p/attention-as-an-art-form</guid><dc:creator><![CDATA[Ashley Zhang]]></dc:creator><pubDate>Sat, 11 Apr 2026 14:31:07 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/193568576/2f6793e2cb9abe3e2ae9a3acd4bd630c.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>This is the recording of our second Art of Being Human event with philosopher Adam Robbert. <a href="https://www.aerobbert.com/?referrer=luma">Adam Robbert</a> is a philosopher by training, and a writer, editor, and researcher by vocation. His focus is on the relationship between practice and perception in the fields of philosophy, religion, and contemplation. He writes regularly on his newsletter,<a href="https://thebasecamp.substack.com/?referrer=luma"> The Base Camp.</a></p><p>The Art of Being Human is an event series by Imbue for the exploration of the shared human questions in our technological age. Sign up to receive invites <a href="https://luma.com/imbue_ai">here</a>.</p><div><hr></div><p><strong>KANJUN (OPENING REMARKS)</strong></p><p>Hello, friends. If you&#8217;re standing, please get your food and come sit down. All right. Well, hello, everyone. Welcome to Imbue. My name is Kanjun. I&#8217;m the CEO and I&#8217;m really excited to have you all here for the second event in our Art of Being Human series.</p><p>I&#8217;ll talk a little bit about Imbue first. Imbue is a fairly radical AI company. Our mission is to make tech serve humans. By tech, we mean tech like the software we use and also the technology industry. And by serve, we mean that we want technology that serves us and is not exploiting us. Today, we live in a world where we are kind of exploited. I don&#8217;t like to put my phone in my room at night because I might end up scrolling Instagram till two a.m. Not my fault &#8212; somebody else is trying to hijack my attention for their own profit. We have this world where we have devices, and on those devices is a lot of stuff whose incentives are not fully aligned with ours. And we believe that as AI agents become more powerful, this exacerbates the problem. Agents are getting to a point &#8212; we think at the end of the year &#8212; where they&#8217;re starting to make decisions on our behalf and starting to represent us. And in that world, we want those agents to fully serve our interests, not the interests of others.</p><p>So that&#8217;s a lot of what Imbue does. We are basically trying to build toward an open agent ecosystem to combat monopoly power in tech, because we think monopoly power and centralization is what allows for this kind of exploitation. Our goal really is to increase the percent of people in the world who can use agents that are fully aligned to them, where your agents are fully expressing your own values and you own them &#8212; they&#8217;re yours. Maybe your data is local and yours. And over the next few months, we&#8217;ll be shipping a lot of tools toward that.</p><p>That&#8217;s one way we build toward this kind of future. But a second way we do that is by figuring out what it means to serve humans &#8212; what it means to be human in this world of very powerful technology, technology that can think. We thought that was our advantage, right? So the conversation today is about attention. Attention is one of our most precious resources as a human. I want to introduce Ashley, and she&#8217;ll introduce Adam. Ashley is our Storyteller at Imbue &#8212; she really does a lot of investigation into what it means to be human and figures out what we should stand for. She has the unique property of writing pieces where almost every piece she writes, I cry. So if you want to be crying, you can subscribe to her Substack. It&#8217;s called Soft Power. Without further ado, Ashley and Adam, welcome.</p><p><strong>ASHLEY (INTRODUCTION)</strong></p><p>It&#8217;s so wonderful to see new and familiar faces. This is the second of the Art of Being Human series, which I like to think of as a shared collective investigation into these human questions in our technological age. And I&#8217;m so excited to be here with Adam today, who is a proper philosopher and a writer and editor. He&#8217;s written this wonderful book, Practice in Still Life, a collection of fragments, essays, and lectures on the topic of attention and perception, introducing various traditions of thought.</p><p>Our first event was last month with my friend Nicholas Paul on the topic of thinking, which I thought flowed well into this topic. Someone actually asked a question about attention at the end. I think a lot of what we talked about was: how do we create the spaciousness and conditions to allow for expansive thinking? Thinking that is beyond the calculation of machines, thinking that is truly novel and expansive and builds upon this great project of the humanities and being human.</p><p>I&#8217;m excited to chat about attention today, which I think is one core aspect of thinking and also just being here in the world and finding your way through it. Adam has a very particular way of conceiving of attention that goes beyond a resource to be used, captured, and exploited &#8212; something that actually puts the power back in our hands as a practice, or as an art form, that we can cultivate in our everyday lives.</p><p><strong>Conversation</strong></p><p><em><strong>ASHLEY: How did you get interested in attention as a topic? What is your perspective on it?</strong></em></p><p><strong>ADAM: </strong>Thank you all so much for being here. And thank you for inviting me. It&#8217;s a beautiful space, and I&#8217;m so glad we can all come together here to have important conversations like this one, in the heart of the city where a lot of the technologies we&#8217;ll talk about are being created and shared throughout the world.</p><p><strong>ADAM: </strong>My background is in philosophy, fairly broadly construed. I have a particular view on the philosophical tradition that opens out into areas you might call spiritual exercise, contemplative practice, various religious traditions, spiritual traditions, arts and religion &#8212; basically taking a full sweep of the humanities, but anchored in philosophy.</p><p><strong>ADAM: </strong>I started to look at philosophy first in the same way that most people do &#8212; as a tradition of texts and arguments and concepts, propositions, logical statements, things that you would debate over and argue over, the realm of reason and rationality. And I think that&#8217;s a very important layer of philosophy. I just don&#8217;t think it&#8217;s the whole of what philosophy is and does. As I started to deepen my inquiry, learning more about the deep tradition of the history of philosophy, I started to become aware of the reality that these statements, these arguments, these worldviews, these big philosophical systems were actually primarily grounded in a set of practices &#8212; practices that if we described them today, we would think of as spiritual exercises, meditative exercises, contemplative practices, more so than what you would get in a philosophy 101 survey course. But it became apparent to me that these practices were actually the medium or the vehicle by which philosophers came to the insights that they shared with the rest of us.</p><p><strong>ADAM: </strong>As I was looking over all of these practices, I realized that one set of practices felt more central than the others &#8212; or a lot of the other practices were actually based on the idea that they would support this central practice. And that practice is a practice of attention. I started to realize that one of the key philosophical moves or attitudes is this question of attention: what happens when we learn how to cultivate attention? What happens when we start to think about how our attention is shaped? There&#8217;s this idea that attention is just a flat, singular thing, but it&#8217;s not. My view is that attention is a very unique thing, a very particular thing, and it&#8217;s very rooted in your habits and practices. That&#8217;s how I came to this phrase: attention is an art form. It&#8217;s something that you can shape deliberately, on purpose. And for me, that became kind of the center of the rest of philosophy.</p><p><em><strong>ASHLEY: What are these practices that you&#8217;ve come to understand as crucial to cultivating attention?</strong></em></p><p><strong>ADAM: </strong>There&#8217;s a famous story in Plato&#8217;s Symposium where the description we get of Socrates is that he&#8217;s there going to this symposium, this party &#8212; it&#8217;s a drinking party, they&#8217;re going to have a good time, they&#8217;re going to discuss philosophy. And Socrates kind of lags behind. The description we get of him is that he&#8217;s lost in thought, or in a sort of meditative trance. We hear from other people who knew Socrates that he would do this from time to time, sometimes for hours on end, just uninterrupted, silent meditative states. But when you look at the Greek, the most literal translation is that he turned his attention to his intellect &#8212; he turned his attention on himself, he turned his attention to his thinking. So this is kind of at the core of the Western philosophical tradition as we learn it from Plato: this whole idea of knowing yourself isn&#8217;t just about knowledge, not just understanding you as a biographical being, but this practice of turning your attention onto yourself in this meditative state.</p><p><strong>ADAM: </strong>If you look at some of the other practices &#8212; and there are many we could discuss &#8212; some tend to be more physical. There are practices of fasting. Fasting is something that shows up again and again in these traditions. There&#8217;s something about our relationship with food and how we take care of our physical body that&#8217;s very important. There are other connections with physical training. If you think about the physical setup of these ancient spaces, they were built in gymnasia &#8212; literal places where you would train your body physically. This was thought of as happening right alongside the philosophical inquiry you would find in the dialogues. So there&#8217;s this relationship between the fasting and the training of the body and the philosophical practice, all coming together in this sense of: how do you perform these contemplative maneuvers, this turning of your attention onto attention or onto yourself? And what are the supporting practices that help you do that?</p><p><strong>ADAM: </strong>If you move ahead a little bit into the late Roman and early Christian traditions, you get similar ideas: you live in a world full of distractions, a world that&#8217;s calling for your attention in different ways. Some of those things are good and virtuous to pursue, but a lot of them are leading you down the wrong path or encouraging the wrong thought patterns. So in those contexts we find practices of withdrawal. Think of monastic monks &#8212; they leave the city and go into a monastery. If you think of the gymnasium as designed for a certain kind of physical training, a monastery is designed for a certain kind of spiritual training. The whole space is designed to free you up from some of those distractions so you can focus on contemplation.</p><p><strong>ADAM: </strong>I&#8217;ll add a note: in a lot of the traditions, we also find that leaving is important, but the returning is just as important. Socrates is constantly talking about leaving the city and coming back to it, leaving the cave and going back into the cave. A lot of the practices are goods in themselves &#8212; the fasting, the physical training &#8212; but the language in some of these traditions is that they guard the stillness you need for contemplation. It&#8217;s in those moments of contemplation, of insight, that we then get some of these texts, some of the written words that come out of these traditions. But they&#8217;re impossible without the practices.</p><p><em><strong>ASHLEY: I&#8217;m glad you brought up the question of returning, because we often have this sense that in order to live a more virtuous life, it requires a kind of Thoreauvian retreat into the woods. But Plato and Socrates didn&#8217;t live in the time of TikTok and Twitter. Today, even beyond our devices, we walk outside and there are cars, billboards, so many things calling for our attention. Do you think we have to settle for this kind of compromised state &#8212; that we can guard our attention somewhat, but really, if we&#8217;re living in urban modernity, we have to accept some degree of constant distraction?</strong></em></p><p><strong>ADAM: </strong>I was hoping you&#8217;d say I have a solution &#8212; but I don&#8217;t. I would think about it this way: our unique version of those distractions is particular to us, particular to our technologies, particular to our moment in time. But that struggle to maintain the attention, to maintain the practice, is present throughout human history. It is a perennial problem. We have texts of medieval monks &#8212; eleventh century monks living in their monasteries, stitching together their manuscripts &#8212; and we have commentary from these fellows to the effect of: I think these illuminated manuscripts are getting out of hand. The lettering is distracting from the text. It&#8217;s getting too visual. They&#8217;re having these debates about how this is robbing them of their attention.</p><p><strong>ADAM: </strong>Or think of Socrates in the Phaedrus, where he&#8217;s practically pulling his hair out about people starting to write. He&#8217;s like: this is going to end philosophy. People are going to just write things down instead of remembering them on their own. He&#8217;s looking at writing and going: this is putting us at real risk. We&#8217;re outsourcing something really important &#8212; our memory, our capacity for attention, our capacity to keep this sort of living set of insights within us and accessible to us without the use of technology. And now we all just accept that writing is part of the intellectual life.</p><p><strong>ADAM: </strong>You can see again and again that the shape and scope of the problems change, but the underlying dynamic is always there. And it comes back to this question of: what is the human being? And why would somebody like Socrates be concerned about the relationship between technology and what the human is? My view is basically that philosophy is not new to these problems. It&#8217;s the specific shape of the technology that is new. Maybe the level and scope is new, but these are problems that we&#8217;ve looked at before, things we&#8217;ve thought about before. We have a deep history of thinking about the human being in relation to technology. There are resources in the tradition that I think can help us today with our particular questions.</p><p><em><strong>ASHLEY: The anecdote about writing touches on something beautiful and powerful about humanity &#8212; our ability to adapt. How do you think the act of philosophy, or thinking, or the practice of training our attention, will evolve as we incorporate new technologies and mediums and environments?</strong></em></p><p><strong>ADAM: </strong>If somebody tells you they know the answer to that question, that&#8217;s a little bit of a red flag. There are so many moving pieces in the technology itself. I don&#8217;t know if we understand what television has done to us &#8212; and that was a long time ago. We&#8217;re still trying to figure out the printing press, and there are new waves of technology coming after that.</p><p><strong>ADAM: </strong>Rather than trying to come up with a forecast or a prediction about some certain set of circumstances, the move instead would be to leverage this fact: humans are transformable through practice, and the practices give us new abilities. They give us a kind of facility. If you think of this athletic metaphor again &#8212; you&#8217;re training different kinds of agility, but instead of physical agility, you&#8217;re training a mental, spiritual, contemplative, and emotional agility. And I think that&#8217;s going to be a better approach, because you don&#8217;t know what the world&#8217;s going to look like in ten years. You don&#8217;t know what it&#8217;s going to look like in six months. But you can train your agility. You can train your lucidity. You can keep hold of your attention. You can return to the practice of attention: what&#8217;s happening, what&#8217;s going on, what&#8217;s important, am I being led down the wrong path here? Can I pull myself back?</p><p><strong>ADAM: </strong>That kind of withdrawal we were talking about &#8212; monks going to a monastery &#8212; can take all kinds of small shapes, all kinds of small maneuvers. You can have a little space in your home, a little space in your office. It can be a kind of sacred space, a contemplative space. The word contemplation &#8212; that suffix, templation, is the same root we have in the word temple. It&#8217;s literally marking out a space, a clearing, for that practice. And what do you do? You wait, you practice. You wait for insights. You don&#8217;t necessarily sit there thinking you have all the answers. You give yourself some time to breathe, some space to do it.</p><p><strong>ADAM: </strong>Just in terms of this question of the human being: one universal fact that I think is true of all humans is that we are, in some important sense, open-ended. That&#8217;s why we have education, why we have these different cultural traditions passed down from generation to generation. If you look at a horse that had just given birth, the foal has been out for literally a couple of hours and it&#8217;s already running, already galloping &#8212; it kind of knows what to do. Humans aren&#8217;t like that. We are open-ended. We come out and we need a lot from the outside. We need a lot from the culture, from the tradition, from history, from our communities. But we also have this curious quality of being able to shape ourselves through practices. And that open-endedness is precisely what allows us to develop into the kinds of people we want to be &#8212; but it&#8217;s also the open-endedness that leaves us vulnerable to this kind of capture that we started with. There&#8217;s something up for grabs. We&#8217;re not set. We can be directed in different directions by algorithms, by news media, by politics, by propaganda, by all of these kinds of things. But it&#8217;s because we&#8217;re open-ended, and the practices are what give us the agility to make sure we&#8217;re going in the direction we think we should be going.</p><p><em><strong>ASHLEY: I love this concept of open-endedness, because I think &#8212; especially in San Francisco &#8212; there&#8217;s a collective anxiety about becoming obsolete. There&#8217;s this sense of resignation every time there&#8217;s a new technical capability or a benchmark we set for ourselves gets crossed. But our lives are so expansive, and it feels like such a disservice to think of life as a series of benchmarks to hit. There is that anxiety, though &#8212; when there are so many possibilities open to you, you can train your attention anywhere. How do you know what is worthy of bestowing your attention?</strong></em></p><p><strong>ADAM: </strong>In some sense, that is the question. My response would be something like: I have a belief in human beings that we have a sort of innateness to us, an innate calling towards something. If you look at the word philosophy, it means love of wisdom &#8212; not knowledge of wisdom. It&#8217;s philosophy. And the reason for that is because in the love, there&#8217;s a kind of desire, an impulse, a longing of being drawn toward something that&#8217;s guiding you. You don&#8217;t know what it is. There&#8217;s a mystery there. But you have some kind of an intuition, a kind of a conscience. Maybe if I follow it, something will happen. I think that&#8217;s a good place to start.</p><p><strong>ADAM: </strong>The other thing I would say is that we are not the first people who have asked: what is actually good to do? What is the good? What is goodness? What is virtue? The world traditions have many answers to these questions, and they tend to collect around a series of ideas and concepts and practices that can clarify why these teachers said what they said. So I think part of it is having some faith that you as a human being have a conscience that can tell you something about what to do, and that you actually have a community &#8212; both here and in history &#8212; that can help you navigate that.</p><p><strong>ADAM: </strong>I don&#8217;t think those two things are actually that circumstantial. If you look at history, we&#8217;ve gone through many phases of what today we would call existential threats or crises, real turns in civilization. And people have thought about it. They have answers to these questions, or at least they have attitudes or stances that will help you navigate them. I feel like we&#8217;re in one of those moments right now, with this rapidly expanding AI technology. But having to deal with new and novel things isn&#8217;t new. We&#8217;re actually quite good at that. And as long as we&#8217;re thinking about it, we tend to be quite good at the things we can see coming. We think about them a lot. We navigate, we adapt. Practice is what&#8217;s going to make you more adaptable and also more able to transform the thing that&#8217;s coming into a shape that might be more livable.</p><p><em><strong>ASHLEY: You&#8217;ve touched on a few practices &#8212; walking, writing, meditation. Can you go deeper into that? And also both individual practices and more relational or communal practices?</strong></em></p><p><strong>ADAM: </strong>One thing I&#8217;ll say about practices is that I think we&#8217;ve gotten into a mode &#8212; and I&#8217;ll speak for myself, this is very biographical &#8212; where we&#8217;ve lost the cohesive community context in which these practices used to live. In history, when you think about religious practices, philosophical practices, spiritual exercises, these were things done in a fairly cohesive community. If you were born in fourteenth-century England, you were probably going to be born a Catholic into a family of Catholics, and the whole social system was organized around these sets of practices. Time took on a different shape &#8212; there was a liturgy to the way the year processed. There were different things you did at different times of the year, and everybody around you was following the same calendar. You were kind of synced up.</p><p><strong>ADAM: </strong>We don&#8217;t really have that experience &#8212; a lot of people in the Bay Area don&#8217;t have that experience &#8212; because for many good reasons, we decided that politics and governance should be rooted in the primacy of the individual. There are all these gains from that. But I think what we lose is that sense of collectivity. And so one of the things I think we&#8217;re groping toward right now, dealing with these larger existential crises, is that our individual, idiosyncratic practices aren&#8217;t quite enough to get at the problem. The question of how AI technology is going to change society is such a big one. You might go crazy if you just try to address it as an individual subject doing your own inwardly facing practice. So rather than focusing on specific exercises, I would think more about: what&#8217;s the collectivity here? Does something change when we perform these practices together, when we have these discussions together, when we do this kind of group, collective activity of thinking about these problems together? I think the equation changes quite a bit.</p><p><em><strong>ASHLEY: You spoke earlier about the temple and creating a sacred space where contemplation can be practiced. What are the conditions that allow for this kind of collective practice to emerge?</strong></em></p><p><strong>ADAM: </strong>There&#8217;s an institutional component, and a component of lineage. There&#8217;s a physical, architectural component. We&#8217;re lucky to have spaces like this one where we can come together and talk. This affords us a different kind of interaction, a different kind of grip on the problem, by coming together. So I think architecture and design &#8212; including the design of technological artifacts, apps, and the way we engage with media &#8212; there&#8217;s a lot of opportunity for designers and architects and people thinking about arts and aesthetics to help us re-envision new forms of collective practice and collective exercise. And we don&#8217;t have to reinvent the wheel. There are good examples in history for how to do this.</p><p><strong>ADAM: </strong>The one I think about the most, coming from an academic background, is the university system. I&#8217;m very much looking at: what has the university become? Is it really fulfilling its mission of giving us a unified understanding of both what the universe is and what the human being is &#8212; how the two relate, how we should act, what we should do, what we should care about? Are they really fulfilling that mission? Especially in the humanities, we see a lot of decline &#8212; a lot of decline in enrollment, departments closing. There&#8217;s a whole generation of tenured professors retiring that I don&#8217;t think are going to be replaced.</p><p><strong>ADAM: </strong>But if you look at history, the roots of philosophy in the West have only an incidental relationship to the university. There&#8217;s about fifteen hundred years of some of the most influential philosophical activity happening before the university even comes on the scene. So there are different ways of thinking collectively, different institutional ways of arranging these things where we can do this as a group, with some rigor, with some historical influence, but also with some novelty appropriate to our times. The technology is also making new things possible. There are things we can do now that we couldn&#8217;t have done without it, in terms of connecting the right people and sharing and disseminating ideas. So there&#8217;s a lot of optimism, honestly, from my perspective, alongside all these dangers.</p><p><em><strong>ASHLEY: Can there be a collective organizing or coordination of attention that isn&#8217;t top-down? Whenever I think about this, my sense is there&#8217;s always some authority figure who&#8217;s like, this is what&#8217;s good, this is what&#8217;s important, this is what we should be valuing and thinking about. But can there be a more bottom-up, democratic sense of collectively figuring out what is worthy of our attention? How do we create the conditions that foster that?</strong></em></p><p><strong>ADAM: </strong>Bottom-up is the approach, and even bottom-up and regional. San Francisco probably needs something particular to San Francisco. New York needs something particular to New York. There&#8217;s this sense that where you are in space matters &#8212; just as time has a qualitative characteristic in some of these traditions, place has the same. So it makes more sense to think about what a bottom-up, ecological approach to institution-building would look like &#8212; one that would do some of the things you&#8217;re describing. What we&#8217;ve been trying is a top-down, monoculture, universalistic approach to institutions, especially if you think of universities. There&#8217;s a lot of top-down energy to them, and I don&#8217;t think it&#8217;s serving their mission very well.</p><p><em><strong>ASHLEY: I&#8217;m curious for your take on the attention economy as a concept and attention as a resource. Do you think that&#8217;s a helpful or accurate framework, especially as we&#8217;re trying to wrangle our attention from so many distractions, devices, and things calling for it?</strong></em></p><p><strong>ADAM: </strong>It&#8217;s really important to think about the kinds of things you&#8217;re paying attention to, because the kinds of things you pay attention to become part of who you are. There&#8217;s an important piece here about memory. The reason that practice works is because you have this capacity for memory &#8212; not necessarily just memorization of facts. Human memory doesn&#8217;t work like a file system on a computer. You can upload any kind of file to a computer, and the hardware doesn&#8217;t reorganize around the content. But you do. Your memories actually reorganize your sense of who you are as a person, which is why we sound like we come from a certain place, why we pick up the language. Your memory is reorganizing your day-to-day perception of the common-sense world &#8212; syncing with it down to the level of your physical sensations and physical perceptions of things, up to your more abstract intellectual thinking. So your memory is very important. It&#8217;s really important to tend to your memory and to think of it as a kind of living ecosystem, a living part of your perception, a living part of your acting.</p><p><strong>ADAM: </strong>There&#8217;s a woman named Eleanor Robins &#8212; I don&#8217;t know if you&#8217;ve read her, she has a great Substack and writes a lot about memory &#8212; and she created this metaphor that really stuck with me: if you accept this view of memory as a kind of living thing that is reorganizing you at a really fundamental level, and what you&#8217;re paying attention to and what gets lodged in your memory is this kind of industrialized, flat, repetitive landscape of inputs, then you&#8217;re kind of creating &#8212; she uses the metaphor of a desert &#8212; your inner life is becoming a desert. Your memory is in this process of desertification. But you can be the gardener of that. You&#8217;re in charge. You can read different books, listen to different music, have different conversations, connect more meaningfully with other kinds of people, create these other kinds of memories and re-enliven that inner garden. And that&#8217;s going to transform your perception.</p><p><strong>ADAM: </strong>If you think of it that way, and then you think about what you&#8217;re doing in the attention economy &#8212; you&#8217;re kind of monocropping your inner life with a certain set of content on social media. And I think basically every human being, even spiritually advanced people, struggles with their phones. There&#8217;s something about it that is really addictive. It&#8217;s really designed for us to use over and over again. And there are uses there &#8212; I find a lot of interesting things on the internet, I find interesting essays, I solve a lot of problems out there. But it comes back to the intentionality: is this serving me, or is it serving the company who&#8217;s trying to sell me ads? You have to guard that. You have to protect that. If you feel susceptible to it, engage in some of these practices: try to make yourself more lucid and agile in those moments, and then do simple things. Put your phone in the other room. Basic withdrawal activities. And don&#8217;t underestimate just the amount of money and resources that&#8217;s after your attention specifically. All of those algorithms are so honed in to you as a person. You need to be careful. You need to guard it. You can change, and you can change your relationship to it.</p><p><em><strong>ASHLEY: Simone Weil, the French mystic and philosopher, had this saying: &#8216;We have to try to cure our faults by attention and not by will.&#8217; When I first read it, I thought it was nice and then I chewed on it. I think willpower is closely tied with attention &#8212; at least in the beginning, when you&#8217;re trying to train or attune your attention, especially if you&#8217;re resisting these external forces. It feels like the will is attention in some way. What do you think of the relation between the two?</strong></em></p><p><strong>ADAM: </strong>Simone Weil is a fantastic person to think with about these questions. If you don&#8217;t know her work and you&#8217;re interested in anything related to what we&#8217;re talking about, she&#8217;s a great writer and thinker to get familiar with. The other person who also writes about Simone Weil is another Simone &#8212; Simone Kotova &#8212; and she writes exactly about this paradox or contradiction. She has a book called Effort and Grace, and it&#8217;s exactly about this relationship: the practices have something to do with effort, with willpower, with your desire, your force, the repetitious nature of your disciplined activity. But on some level, that&#8217;s not really enough. That&#8217;s not really the deepest part of the practice.</p><p><strong>ADAM: </strong>The deepest part &#8212; and I think contemplation is different from attention in this way &#8212; is where attention is this kind of concentrated, one-pointed fixedness where you&#8217;re really focused on something, but contemplation has more of this character of letting go and letting be. Giving up the project of willful change and just sitting. Sitting in what? Sitting in silence, sitting in receptivity, giving up the project of seeking and just seeing what happens when you give space to not seeking &#8212; seeing if in that space of grace, something else doesn&#8217;t emerge, if things don&#8217;t show up to you differently.</p><p><strong>ADAM: </strong>A lot of these practices, especially thinking of attention as an art form, are really about how you get things to show up for you in your first-person experience. We don&#8217;t all share the same physical experience even of the physical objects in the room. Everything has to do with our training, our knowledge, our education, our experience. Things are showing up differently to each of us. Some people have great expertise at, say, how to design a space &#8212; physical space is showing up to them in a particular way because of their attention practices. So there are all these ways in which you&#8217;re trying to train your attention to see things from another angle, to see another level of detail that other people aren&#8217;t getting, or to understand or interpret a kind of meaning that other people might be missing.</p><p><strong>ADAM: </strong>But then there&#8217;s this other move, this contemplative move, which is more like: I&#8217;m going to stop trying to figure out and understand. I&#8217;m going to let go. And actually in that space &#8212; that&#8217;s typically, for me at least, where the thing I was looking for kind of shows up. But there isn&#8217;t really a program for that. There isn&#8217;t a way to make that happen on command, but you can give it space to happen. I think this is what Simone Weil is talking about. She talks a lot about waiting &#8212; waiting for years, sometimes. And how central that is to a deeper kind of philosophical insight. That might be one of the essential moves in a time where there&#8217;s never not something to pay attention to. Withdrawing from that and just sitting. Sitting in the silence. Sitting in the emptiness. Sitting in the darkness. There&#8217;s no goal. There&#8217;s nothing else on the other side of it. And then, as Weil describes, some interesting things might happen.</p><p><strong>&#8212; Audience Q&amp;A &#8212;</strong></p><p><em><strong>AUDIENCE (Michelle): Thank you for the educational talk. If someone were to ask you, beyond name and form, who are you &#8212; how would you answer that question? And secondly, knowing who you are, what would you define as surrender?</strong></em></p><p><strong>ADAM: </strong>Beyond name and form. That&#8217;s a good example of a question that you could just sit and wait with. The philosopher in me wants to say something like: I am a space in which meaning emerges. A certain kind of understanding takes shape through the activity that is me, and that shape and that understanding has a responsibility and a uniqueness to it &#8212; as an expression of the rest of reality, as an expression of the rest of the universe. As a human being, we are a unique kind of opening where we can even ask and reflect on that question. And I think we are in some sense responsible for how that makes us act and how that affects other people and the earth as a whole.</p><p><strong>ASHLEY: </strong>I would say something similar, but perhaps less poetic: just the collection of kind of experiences, but also histories and stories along the lineage long before you. And to the question of surrender &#8212; I believe that we have free will to some degree. We have our willpower, our agency, and that&#8217;s very valuable and something I care about. But part of surrender is being like: whoever you are is shaped by forces beyond your control.</p><p><em><strong>AUDIENCE: We&#8217;re at an AI company, and I wonder: when we use terms like attention, perception, contemplation, consciousness, thinking, judgment &#8212; is that something we should even consider attributing to large language models or digital computation? Because when I think back at prior media technologies &#8212; whether it&#8217;s Socrates worrying about writing or the printing press &#8212; no one was worried that books could think or perceive or were conscious on their own. And now we&#8217;re tempted to imagine that these digital technologies actually have attention, make decisions, are agents. How would you think through those questions?</strong></em></p><p><strong>ADAM: </strong>Those are open-ended questions, and I think the answer you would give would probably change maybe month to month, year to year. That whole packet of terms &#8212; judgment, thinking, decision, attention &#8212; I think we&#8217;re anthropomorphizing the system to a large degree. There&#8217;s a great essay by Yuki on LLMs &#8212; maybe two years ago now &#8212; basically still thinking about them along the lines of a human prosthetic. They&#8217;re an extension of us, and will likely remain an extension of us. So we&#8217;re still doing the judging, the directing, and the attending, and they are executing on that. I don&#8217;t have one hundred percent confidence that that&#8217;s true, but I do think that in order for something to be a judgment in the same sense that a living human being makes a judgment, it has to have a point of view, and it has to have a kind of self-awareness. And I don&#8217;t see that in the systems we have today. And I don&#8217;t see how scaling up these systems will by itself create that.</p><p><strong>ASHLEY: </strong>I don&#8217;t have a robust answer to whether it&#8217;s accurate to apply these terms to these systems. But what I can say is: I&#8217;m glad that this new technological wave has raised these questions and inspired us to examine these terms more closely. There&#8217;s this sense that before, we kind of took things like thinking and agency for granted &#8212; we all participate in these activities without really, mostly, inquiring as to what the purpose of these things are, and why we engage in them. And I think now, in this anxiety to try to differentiate ourselves from machines, we&#8217;re being pushed to understand them better. And also to understand ourselves &#8212; what we do, in what ways it&#8217;s different from the ways machines work.</p><p><strong>KANJUN: </strong>One way I think about agents is that they are actually taking on a lot of the things that humans do today, and they do make judgments. They have a point of view, quote-unquote. But that point of view is not necessarily determined by them &#8212; it&#8217;s determined by someone or something else. You could say that maybe all humans don&#8217;t have a point of view that&#8217;s fully determined by us, but we have a kind of open-ended process where, from our perceptual experience, our own point of view will evolve. And that&#8217;s something agents are missing today. That doesn&#8217;t mean that in three years they&#8217;re still going to be missing that. Continual learning and open-endedness &#8212; it&#8217;s totally possible that they will have some of these properties that humans have. They don&#8217;t seem to have this thing we have: this awareness, attention, workspace where we&#8217;re recombining things. But maybe that&#8217;s just an experience we have and they have a different experience. It&#8217;s hard to tell.</p><p><em><strong>AUDIENCE: Do you think our experience with LLMs and ML models would be a sort of attempt at self-recognition &#8212; the same way that we do when we meet someone else and try to make them see ourselves? But those models are actually only mirrors, and we are confusing them with subjects. And do you actually think that we can create semantic mathematical models to understand how attention is harnessed, to reverse the effects of it by creating some sort of semantic model of subjectivity, of the way we make sense of reality through meaning? Do you think this can be formalized and implemented?</strong></em></p><p><strong>ADAM: </strong>I&#8217;m skeptical. A living organism, I think, is organized in a particular way on purpose, such that it actually does reflect a kind of underlying order in the universe that gives rise to it and maybe even beyond it. And the way that, as I understand it, the technology is being organized &#8212; if it does create something like that, it&#8217;ll be along a completely different line. I don&#8217;t think it&#8217;ll be like us. But I do think that as it stands right now, when you&#8217;re interacting with these platforms, that is what you&#8217;re doing: you&#8217;re getting this kind of reflection back to you based on your inputs, based on your inquiries. And in some sense, the question of the algorithm on social media is transferring over into the way some of these LLMs are being designed &#8212; in order to capture your attention. They have increasingly good memory about your previous inquiries and where to guide you. But I think that&#8217;s all a reflection of you in a certain sense, in the same way that your social media feed is a reflection of some part of you. It&#8217;s not giving you the same connection you will have with another human being. I do think engagement with other humans is going to be the key thing, and I don&#8217;t think that&#8217;s replaceable.</p><p><strong>ASHLEY: </strong>I do think broadly, technologies reflect the values of their creators and also the incentives that govern the values of their creators. That&#8217;s my high-level answer, but I&#8217;m very curious about this. I think there are some researchers in the room &#8212; if people have takes, find us afterward and we can chat about this.</p><p><em><strong>AUDIENCE: These practices &#8212; whether individually or communally &#8212; are they sustainable or even real without a common object of love that the individual or the community possesses? And are we even capable of identifying what that is on our own in a way that will be sustaining and orienting enough to free us from the distractions around us?</strong></em></p><p><strong>ADAM: </strong>I&#8217;m glad you asked that. I do think in some sense, love is at the center of what pulls these practices forward. Attention is itself a question of what we love and what we care about and what we&#8217;re concerned about. Intellectual activity is often thought of as being centered in the mind, in the brain, in concepts and language and reason. But a lot of these practices, some of the words used to describe them in the scholarship, are cardiocentric. If you think again about the Christian monastic practices of contemplative prayer &#8212; Cloud of Unknowing is a great anonymously authored thirteenth-century text &#8212; it&#8217;s basically about sitting in the love of God. The reason for that is that there are these transcendental questions of ultimacy that knowledge actually can&#8217;t attain to. There&#8217;s something about knowledge that is perspectival, circumstantial, tied to a more empirical sense of reality. What goes beyond that &#8212; in the Cloud author&#8217;s language &#8212; is love. This kind of loving devotion is both what gets you to care about the practice and drives the practice, and is, if done right, what the practice is actually developing. The loving devotion is absolutely central. And that gets lost in our very overly intellectualized environment.</p><p><strong>ASHLEY: </strong>That question reminded me again of a Simone Weil quote &#8212; love being the rarest and purest form of generosity. In that framing, it feels very abundant. And I think the words we use really shape the way we conceive of things. Right now the language we use around attention feels really aggressive. I often find that something like generosity &#8212; bestowing attention &#8212; feels quite beautiful in that way. There&#8217;s also the poet Mary Oliver, who wrote a lot about attention, and she has this one essay where she writes that attention without feeling is just a report. There&#8217;s something about what we&#8217;re called toward &#8212; what speaks to us &#8212; that&#8217;s a movement of our soul or emotions, and that&#8217;s often overlooked in these conversations. Love and attention. I think they&#8217;re so closely intertwined.</p><p><em><strong>AUDIENCE: You said that memory drives a lot of where our attention goes, and naturally I would say that what you value &#8212; your virtues, your interests &#8212; is going to drive what you consume and what you store as memory. Does attention follow value, virtue, and interest? Or is it the inverse &#8212; that interests, values, and virtue are actually following attention?</strong></em></p><p><strong>ADAM: </strong>I think they have a complex relationship. What you value tends to be what you attend to. But what makes any of this philosophical is this act of trying to give words to these processes: how do my values guide my attention? How does attention transform my values? You can go through life without examining much of any of this &#8212; without a firm sense of why you care about the things you care about, or whether you should care about other things. All of this stuff takes on a new tone when you start to ask questions like this.</p><p><strong>ADAM: </strong>The way to think about it: values are driving what you pay attention to. But as you pay attention and follow those values, your sense of how to judge and enact those values changes. This is Aristotle&#8217;s point &#8212; ethics is a practice, a habit. You practice by imitating virtuous people and that starts to transform who you are as a person. But as that happens, your sense of judgment increases, your sense of being able to pick out &#8212; if you think of perception in a physical sense, about the lights and everything in this physical space, there&#8217;s also moral perception. What is moral character? What is moral behavior? What is right in this particularly complex, concrete situation? That&#8217;s the kind of stuff that attention starts to bring to the surface if you&#8217;re engaged in these self-reflective practices. So there&#8217;s a reciprocality there: the values are transforming your attention and the attention is deepening your understanding of the values.</p><p><strong>ASHLEY: </strong>This reminded me of the writer Jenny Odell, who wrote How to Do Nothing. She has this anecdote about over Covid, she would go to a local park and watch birds. Over time, she became curious about what types of birds they were and would look them up. This sense of attuning her attention to these birds and understanding them more deeply helped her see the world in a deeper, richer way. The more you pay attention, the more things come into your purview, the more details you&#8217;re able to see. There&#8217;s a cycle &#8212; the world, kind of unfolding itself to you the more attention you pay to it. There&#8217;s some reciprocality to it.</p><p><em><strong>AUDIENCE: When we look at things that draw our attention today &#8212; like social media or AI &#8212; I always viewed it as the outcome of economic incentives. But when you mentioned that back then they were talking about block printing being too gaudy and grabbing attention, do you think we have an innate desire to create things that grab attention, or is it an accidental outcome of creating better technologies?</strong></em></p><p><strong>ADAM: </strong>I think a lot of our relationship to technology is inevitable, and so the question of how to steer it is actually the question. Let me share a very quick story &#8212; it&#8217;s a myth, Plato again. How many of you know Prometheus? Most people. How many of you know about Epimetheus? A couple. In the Protagoras, Socrates tells this story of Epimetheus and Prometheus &#8212; a story that the French philosopher Bernard Stiegler has made a lot of, and if you&#8217;re interested in the relationship between humans and technology, Stiegler is a great resource for this.</p><p><strong>ADAM: </strong>Basically: Epimetheus and Prometheus are called Titans, given this task by the gods. Epimetheus is given the task of assigning essential skills or characteristics to all of the creatures on Earth, including human beings. He gives turtles a shell. He gives tigers claws and teeth. He gives birds the ability to fly. All the creatures get these unique skills &#8212; and he forgets the human beings. He&#8217;s a little absent-minded. Then Prometheus comes over and says, how did it go? Epimetheus says, yeah, mostly fine. And Prometheus looks around, seeing all these creatures with real abilities, and he looks at the humans and says, what about these guys? Epimetheus: I forgot.</p><p><strong>ADAM: </strong>So we&#8217;re running around &#8212; and this goes to our open-endedness again &#8212; we don&#8217;t get one of those innate skills. Prometheus then takes the fire. But it&#8217;s actually two things: the fire and techn&#234; &#8212; technology. He goes and gives us that knowledge because we don&#8217;t have an innate skill. And so instead of an innate characteristic, we have this ability to externalize ourselves through technology. This isn&#8217;t something that we have separate from who we are. This is part of who we are. Regardless of where we are in history, whatever part of the world, whatever time &#8212; past or future &#8212; if there are human beings there, they&#8217;re going to be technological beings, because we don&#8217;t have any other means to get around. The question of technology is part of the essence of who we are. Our essence is, in some sense, outside of us.</p><p><strong>ADAM: </strong>Stiegler also says: this is why we have to tell each other stories. This is why we have to create schools and universities and cultures where we transmit knowledge &#8212; orally or through the written word &#8212; to the next generation every time again. We need some means of transmitting that across generations in that technology. The reason this comes up, and I find this a compelling story about the human being, is that this is part of who we are. There&#8217;s no back-to-the-land, let&#8217;s-abandon-technology option. It&#8217;s in some sense part of what being human is. So the question of practice in relation to the techn&#234;, in relation to technology, is where philosophy comes in, where thinking comes in, where transformation comes in. We can transform ourselves through practice and culture, and we can transform ourselves through technology. And ideally, we have some kind of wisdom guiding the practice, guiding the transformation, that can feed into how we build the technologies. But the technologies are always going to be there.</p><p><strong>ASHLEY (CLOSING)</strong></p><p>That was beautiful. I think that was the perfect note to conclude this. First of all, thank you to Adam for your generous conversation and wisdom and your time. Thank you to everyone at Imbue for the amazing food and drink setup and for transforming our office into this event space. Thank you to you all for coming on a Wednesday. We&#8217;ll have the space open for another hour or so to mingle. Everyone here is wonderful. We&#8217;ll have this next month, so hope to see you again. Thank you.</p>]]></content:encoded></item><item><title><![CDATA[10 theses on attention]]></title><description><![CDATA[Reflections from Art of Being Human #2 with philosopher Adam Robbert]]></description><link>https://ideas.imbue.com/p/10-theses-on-attention</link><guid isPermaLink="false">https://ideas.imbue.com/p/10-theses-on-attention</guid><dc:creator><![CDATA[Ashley Zhang]]></dc:creator><pubDate>Sat, 14 Mar 2026 14:30:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XgVw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XgVw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XgVw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XgVw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XgVw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XgVw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XgVw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1045610,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://ideas.imbue.com/i/189701169?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XgVw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XgVw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XgVw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XgVw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7229d74c-8577-4cc0-94db-4878ef97b0f5_1857x1045.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A few weeks ago, we hosted our second Art of Being Human <a href="https://luma.com/6qce49z3">event</a> with philosopher <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Adam Robbert&quot;,&quot;id&quot;:3885734,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e4e36fe-cbc8-46e7-8fad-7687d044dcaf_1500x1650.jpeg&quot;,&quot;uuid&quot;:&quot;6dd1ebf7-22dc-482e-bc85-923967b5cec9&quot;}" data-component-name="MentionToDOM"></span> on attention as an art form. We&#8217;ll be posting the full video and transcript soon. If you&#8217;d like to receive it, along with invites to future events, subscribe here.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://ideas.imbue.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://ideas.imbue.com/subscribe?"><span>Subscribe now</span></a></p><p><strong>1. Attention is the practice beneath all philosophical practices.</strong> Nearly every contemplative tradition&#8212;Stoic, Platonic, Christian monastic&#8212;was built on attention as the foundational practice. Practices like fasting, physical training, and meditation exist to guard the stillness attention requires. Philosophy, in this sense, is a discipline of perception, not mere argumentation.</p><p><strong>2. Distraction is a perennial human problem, not a modern pathology.</strong> Medieval monks complained that illuminated manuscripts were too decorative and distracting from the text. Socrates bemoaned writing as a technology that would erode our memory. But now, we accept writing as part of the intellectual life. The shape of the distraction changes, but the underlying dynamic doesn&#8217;t.</p><p><strong>3. Human memory is an ecosystem that must be tended to.</strong> Human memory doesn&#8217;t store files like a computer, but reorganizes perception at the level of physical sensation. What you attend to shapes your memory, and how reality shows up to you. Writer <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Eleanor Robins&quot;,&quot;id&quot;:272006,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!pSQS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0e82b0-21a7-4649-82a5-9f82246fa73d_3300x4950.jpeg&quot;,&quot;uuid&quot;:&quot;fb835e3e-0d65-4b20-aa4e-b9c19030b935&quot;}" data-component-name="MentionToDOM"></span> offers a <a href="https://open.substack.com/pub/eleanorrobins/p/memorising-poems-and-stories-is-magic?utm_campaign=post-expanded-share&amp;utm_medium=post%20viewer">metaphor</a>: your memory is like a garden that you must fertilize with rich material; overconsumption of flat, repetitive media leads to the &#8220;desertification&#8221; of the inner life. </p><p><strong>4. Open-endedness is simultaneously our greatest vulnerability and our greatest strength.</strong> A foal can gallop hours after birth, but humans arrive almost unformed. This open-endedness is what makes us susceptible to algorithmic capture and propaganda, but is also what makes self-cultivation through practice possible. We are constitutively shapeable in both directions. </p><p><strong>5. Attention and contemplation go hand in hand.</strong> Attention is focused, directed, and one-pointed. Contemplation is the opposite: letting go of the seeking, sitting in receptivity, waiting. Often, insights don&#8217;t come from willful concentration but from the space that opens when you sit in the silence, darkness, and emptiness without a goal. (For more, read <a href="https://plato.stanford.edu/entries/simone-weil/">Simone Weil</a> and <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Simone Kotva&quot;,&quot;id&quot;:170903821,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c243093f-c5d6-4517-a254-fb2089146b9c_1932x2576.jpeg&quot;,&quot;uuid&quot;:&quot;fc863916-b289-4497-9646-3da282eb3bed&quot;}" data-component-name="MentionToDOM"></span>&#8217;s <em><a href="https://www.bloomsbury.com/us/effort-and-grace-9781350113640/">Effort and Grace</a></em>)</p><p><strong>6. Technology is part of the essence of who we are. </strong>In Plato&#8217;s <em><a href="https://www.gutenberg.org/files/1591/1591-h/1591-h.htm">Protagoras</a></em>, Socrates tells this story of Epimetheus and Prometheus. Epimetheus is given the task of assigning essential characteristics to all of the creatures on Earth, but forgets the humans. Then, Prometheus brings humans fire and the technology to create it. So instead of an innate characteristic, humans have the ability to externalize ourselves through technology. Our essence is, in some sense, outside of us. (For more, read <a href="https://monoskop.org/images/6/6f/Stiegler_Bernard_Technics_and_Time_1_The_Fault_of_Epimetheus.pdf">Bernard Stiegler</a>.)</p><p><strong>7. Individual practice is insufficient for civilizational-scale problems.</strong> The cohesive, liturgically structured communities of 14th-century Catholic England provided synchronized collective practice: a shared calendar that organized time qualitatively. We&#8217;ve traded that collectivity for individual freedom. But atomized inward practice can&#8217;t adequately respond to how technologies are shaping society. It demands a new form of collective institutions, built bottom-up and ecologically, not top-down and monolithically.</p><p><strong>8. Values and attention form a virtuous cycle. </strong>Aristotle argued that ethics is a habit: you practice by imitating virtuous people, and that starts to transform who you are as a person. Practice, therefore, refines our capacity for moral perception. The more carefully you attend to something, the more it reveals itself&#8212;which in turn shapes what you care about.</p><p><strong>9. Spaces, both physical and digital, can be designed to afford deeper modes of connection and attention.</strong> The suffix of the word &#8220;contemplation,&#8221; <em>templation</em>, is the same root in the word &#8220;temple.&#8221; It marks out a space, a clearing, for a certain practice. In architecture and design&#8212;including the design of technology and ways of engaging with media&#8212;there's opportunity to re-envision new forms of collective practice and exercise, drawing on inspiration from the past.</p><p><strong>10. Love is central to contemplation and attention.</strong> The deepest traditions of contemplative practice are not mind-centered but cardiocentric (heart-centered). The 14th-century <em>Cloud of Unknowing</em> frames contemplation as &#8220;sitting in the love of God&#8221; because love is what transcends the limits of perspectival knowledge. There are transcendental questions that knowledge ultimately can&#8217;t attain, but loving devotion can help us access.</p><div><hr></div><p><em>Our next Art of Being Human event will be on March 25, on cultivating audacity with Courtney Hohne. Courtney is the founder of<a href="https://unowned.team/"> un/owned</a>, a new kind of innovation lab tackling &#8220;ownerless problems,&#8221; and spent 10+ years building Google X, the world&#8217;s first moonshot factory, as Chief Storyteller. <a href="https://luma.com/mdr8c7f3">RSVP here</a>.</em> </p><p><em>As always, we welcome your thoughts!</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://ideas.imbue.com/p/10-theses-on-attention/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://ideas.imbue.com/p/10-theses-on-attention/comments"><span>Leave a comment</span></a></p><p></p><p>   </p>]]></content:encoded></item><item><title><![CDATA[Introducing Imbue's Substack]]></title><description><![CDATA[A space to think together about the technological future we want and how to create it]]></description><link>https://ideas.imbue.com/p/introducing-imbues-substack</link><guid isPermaLink="false">https://ideas.imbue.com/p/introducing-imbues-substack</guid><dc:creator><![CDATA[Ashley Zhang]]></dc:creator><pubDate>Wed, 11 Mar 2026 14:30:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!vTJL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vTJL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vTJL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic 424w, https://substackcdn.com/image/fetch/$s_!vTJL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic 848w, https://substackcdn.com/image/fetch/$s_!vTJL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic 1272w, https://substackcdn.com/image/fetch/$s_!vTJL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vTJL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic" width="1456" height="965" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:965,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1688449,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://ideas.imbue.com/i/189728090?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vTJL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic 424w, https://substackcdn.com/image/fetch/$s_!vTJL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic 848w, https://substackcdn.com/image/fetch/$s_!vTJL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic 1272w, https://substackcdn.com/image/fetch/$s_!vTJL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1480f63-704c-44d7-abf0-0d0ccd28cea8_2344x1554.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Kanjun, Ashley, Glenn, and Matt wrestling with ideas at our recent offsite.</figcaption></figure></div><p>The AI scene is changing at a head-spinning rate. What once took months and twenty engineers can now be built in a weekend by one. A year ago, our company was organized around <a href="http://imbue.com/sculptor">a single product</a>; now, we&#8217;re working on <a href="https://ideas.imbue.com/p/a-more-radical-imbue">a dozen projects in parallel</a> with a team of the same size. And this is just the beginning.</p><p>The speed can feel exhilarating, but ultimately, it&#8217;s not the technological capabilities that matter. What matters is what these technologies are built <em>for</em>, and what future they engender. Is it one in which every person can use these tools to pursue what matters to each of us and develop our human capacities? Or is it one in which the technologies embedded in our lives become increasingly exploitative, and our ability to shape them constrained?</p><p>The decisions being made now&#8212;about ownership, openness, and who these systems answer to&#8212;will define our technological future. We believe whichever way it goes depends on whether open agents win over closed platforms. This means more control over your data, and a greater ability to create and modify the software and agents you rely on so they serve your intentions and interests, not the company that built them. That&#8217;s the future we&#8217;re building toward.</p><p>Of course, we&#8217;re not alone in these efforts. We&#8217;re inspired by conversations we&#8217;ve shared with builders like <a href="https://ideas.imbue.com/p/geoffrey-litt">Geoffrey Litt</a> and <a href="https://ideas.imbue.com/p/episode-04-joel-lehman-openai-on-8e0">Joel Lehman</a>; <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Cory Doctorow&quot;,&quot;id&quot;:2728172,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/89caf8a4-bb6c-4a63-abe4-e1987a0448cc_144x144.png&quot;,&quot;uuid&quot;:&quot;46910eb0-4e9d-4b24-b635-21fef7e42ca2&quot;}" data-component-name="MentionToDOM"></span> and <a href="https://www.law.columbia.edu/faculty/timothy-wu">Tim Wu</a>&#8217;s writing; collectives like <a href="https://resonantcomputing.org/">Resonant Computing</a> and <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Cosmos Institute&quot;,&quot;id&quot;:179794473,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Wciv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82c949ae-ae59-42df-847d-acff37e6d99c_2026x1944.jpeg&quot;,&quot;uuid&quot;:&quot;2e9b5d3e-7905-457d-af1f-54ac1620e15f&quot;}" data-component-name="MentionToDOM"></span>; philosophers like <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Seth Lazar&quot;,&quot;id&quot;:34920403,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ac676fa-7fc6-417e-b0de-83734770113b_1288x1072.jpeg&quot;,&quot;uuid&quot;:&quot;db2cb812-070e-489f-93f6-bc946571ba36&quot;}" data-component-name="MentionToDOM"></span> and <a href="https://www.shannonvallor.net/">Shannon Vallor</a>; and computing pioneers like <a href="https://en.wikipedia.org/wiki/Alan_Kay">Alan Kay</a> and <a href="https://en.wikipedia.org/wiki/Tim_Berners-Lee">Tim Berners-Lee</a>. All of them advocate for conditions that foster greater human freedom, creativity, and virtue. </p><p><strong>This newsletter is a space to think together about</strong> <strong>the kind of future we want, and how to design the technologies and systems that help us get there. </strong>We hope this can be a way to stay anchored to the fundamental questions in a rapidly evolving AI landscape: What do human liberty and flourishing require, and how can we build technologies to serve them?</p><p>We&#8217;ll be sharing, ~weekly:</p><ul><li><p><a href="https://imbueai.substack.com/t/essays">essays</a> on how software, data, and AI can be built to serve the human good, informed by what we&#8217;re building here at Imbue</p></li><li><p><a href="https://imbueai.substack.com/t/podcast">conversations</a> with people building and thinking at the frontier</p></li><li><p>reflections on (and invites to!) the <a href="https://luma.com/imbue_ai?period=past">events</a> we host in our SF office</p></li><li><p>the best things we&#8217;re reading that inform our thinking</p></li></ul><p>If these questions interest you too, we&#8217;d love for you to join the conversation. What&#8217;s on your mind around our technological future? What worries you, and what brings you hope?</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://ideas.imbue.com/p/introducing-imbues-substack/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://ideas.imbue.com/p/introducing-imbues-substack/comments"><span>Leave a comment</span></a></p><p>The future should be determined not by a handful of powerful companies or opaque systems, but by all of us humans, out in the open, together. This project, we hope, is a step in that direction.</p><p>&#8212; <a href="https://substack.com/@kanjun">Kanjun</a>, <a href="https://substack.com/@joshalbrecht">Josh</a>, <a href="https://substack.com/@ashleydzhang">Ashley</a> and the Imbue team</p><div><hr></div><h3><strong>Upcoming events at Imbue</strong></h3><ul><li><p><a href="https://luma.com/6ujmjf8f?tk=ge4CJL">AI Philosophy Nights: The Post-Productive Human</a> (sponsored by Imbue): A conversation with philosopher of technology Tony Kashani on the transformation of value creation and its implications on power, agency and meaning.</p><ul><li><p>Friday, March 13, 6:30-9:30 PM</p></li></ul></li><li><p><a href="https://luma.com/mdr8c7f3">Cultivating audacity with Courtney Hohne</a> (<em>The Art of Being Human</em>): A conversation with former Google Moonshot Factory Chief Storyteller <a href="https://www.linkedin.com/in/courtney-hohne-0424264/">Courtney Hohne</a> on cultivating the imagination and courage to tackle the most urgent, ambitious problems we face today</p><ul><li><p>Wednesday, March 25, 5:30-8:30 PM</p></li></ul></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://ideas.imbue.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe to receive new essays, podcast episodes, and event invites in your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[A more radical Imbue]]></title><description><![CDATA[Our experiment in working with radical agency, control, and choice]]></description><link>https://ideas.imbue.com/p/a-more-radical-imbue</link><guid isPermaLink="false">https://ideas.imbue.com/p/a-more-radical-imbue</guid><dc:creator><![CDATA[Josh Albrecht]]></dc:creator><pubDate>Fri, 30 Jan 2026 23:34:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8ef7310a-77d2-4417-bac8-39956afa63cc_1920x1080.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pght!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pght!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pght!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pght!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pght!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pght!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:117517,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://imbueai.substack.com/i/187790765?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pght!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pght!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pght!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pght!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a70164c-2db3-4908-8de7-2c238c759d95_1920x1080.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At the start of 2026, I gathered the Imbue team and spoke about how both our CEO Kanjun and I became disillusioned with Imbue&#8212;and how we found our way back, more excited than ever about who we are and what we&#8217;re building.</p><p>I want to share that story with you.</p><div><hr></div><p>Let me set the stage: It&#8217;s early 2025, and the Singularity is in full swing. It seems like every day a new model comes out, or a new product, that totally changes the landscape. Ideas from even a few months ago seem antiquated or obviated, and we can all feel that pressure bearing down on us.</p><p><em>Will we ever ship a product? Will anyone use that product? Will anyone like the product? Are we, as a research company, capable of making something that has users?</em></p><p>These questions worried all of us, and Kanjun and I were no exception.</p><p>So we did what many people naturally do when stressed: we entered fight mode. We doubled down on decisions. We tried to be &#8220;efficient,&#8221; to &#8220;focus&#8221; people on the &#8220;right&#8221; things, to &#8220;optimize&#8221; our &#8220;product development process&#8221; so that we could &#8220;win.&#8221;</p><p>We sprinted&#8212;we all sprinted&#8212;toward shipping Sculptor. We put in a lot of late nights. We worked hard. We added lots of features (and bugs), and, ultimately, we <em>did</em> ship something.</p><p>And not only did we ship something, we actually got more attention than we expected. People downloaded the product and tried it, despite its complexity and bugs. And many people even continued using the product week after week!</p><p>To be clear: this was a huge win! Most startups never ship anything. Of the startups that ship, the vast majority do not end up with daily active users. This is a 95th+ percentile outcome, <em>especially</em> for the <em>first</em> product we&#8217;ve shipped into the world.</p><p>But shipping Sculptor in early October was just the beginning. There was still so much more work to do: metrics to create, user interviews to conduct, bugs to fix, performance issues to address, regressions to add tests for, Discord message to respond to, Sentry alerts to silence. It was stressful.</p><p>So again, we doubled down. We had &#8220;stability&#8221; weeks, spent time bug fixing and refactoring,, all while trying to ship a backlog of new features just to keep up with the ever-changing landscape.</p><p>And we <em>did</em> make things better. Sculptor today is far better than the Sculptor we shipped in October. We have dedicated users, even in spite of those bugs. I&#8217;m really proud of what we built, and excited for where it&#8217;s going.</p><p>But this story actually isn&#8217;t about the product. It&#8217;s about us, as a company.</p><p>In all of this stress and sprinting and focusing, we lost sight of the forest for the trees. We forgot <em>why</em> we were here. And we got a bit burned out.</p><p>But most importantly, we lost something along the way. Fun Fridays just weren&#8217;t as fun anymore. There was something missing from team dinners and lunch conversations. There was something that felt off about our meeting structure and our processes. Things just felt off.</p><p>During our holiday break, and leading up to it, it was hard to put our finger on exactly what was missing.</p><p>Was it that we didn&#8217;t know what the mission was? Was it that we needed to &#8220;transition to being a product company&#8221;? Was it just that the office felt empty when people were out and sick? Or was it simply a manifestation of the uncertainty about the future, given how much was changing in the outside world?</p><p>Funny enough, Kanjun and I both independently came to the same conclusion.</p><p>We realized that we&#8217;d lost our way. The Imbue we&#8217;d been building had lost something core and special and important about it. Imbue is about <em>so much more</em>than just making a product and making money. We are not a normal startup, and we never have been.</p><p>We&#8217;d been falling into the default startup patterns and startup behaviors: teams of engineers with managers working toward feature roadmaps, processes for triaging customer support tickets, defining metrics and KPIs, even doing user interviews.</p><p>These are all fine things, but they&#8217;re not what matters.</p><p>When we saw things moving fast in the world, it made us stressed and more narrowly focused on what was right in front of us, on the familiar patterns. When new competitors launched, or existing competitors launched new features, we worried: do we have enough features? How do we compare?</p><p>But what we should have done is zoom out, take a step back, and remind ourselves: why are we actually here? What is the point?</p><p>We&#8217;d forgotten that Imbue is not just about <em>what</em> we&#8217;re building&#8212;it&#8217;s also about <em>how</em> we&#8217;re building. We want Imbue to be an example of a different way to be: as individuals, as a group, as a company.</p><p>The whole reason we started Imbue is because we want to increase human agency in the world, to give each person more ability to author their lives. And that begins with how we work here, in everything we do.</p><p>We believe that it is better to grow together, to be kind, to think good, and to have fun. To build a world with more humanity, more openness, more agency, more liberty, and more play.</p><p>We believe that this way is <strong>better</strong>. We believe that, over time, it will win.</p><p>Every day I see you all build cool things, whether they&#8217;re side projects or features or bug fixes. I believe that together we can <strong>redefine what it means to do meaningful work in the era of AI</strong>, both for each of ourselves, and as an example for the broader world.</p><p>I want to make Imbue radical again. And the way we&#8217;ve been working hasn&#8217;t delivered on this.</p><p>If we&#8217;re going to show the world that we can empower people, it starts with each one of us. We want to show that it&#8217;s not just <em>possible</em> to have everyone joyfully working and growing together and deciding what to work on themselves and being empowered; we want to show the world that this is the <em>superior way of being</em>.</p><p>Imbue is an experiment. It is a question: what happens if we give people agency and control and choice? And support them in working on the things that they are most passionate about, and in the way that they are most suited to work on those things? What does that world look like?</p><p>Imbue exists to answer that question&#8212;so we should all start living it.</p><p>It&#8217;s now possible to ship projects <em>much</em> more quickly than ever before. I experienced this first-hand over the break hacking on things, and I think we&#8217;ve all seen this in the world and broader landscape as well.</p><p>There&#8217;s a massive opportunity for us here. If we really lean in to using the tools we&#8217;ve built, like Sculptor, I think there&#8217;s a whole new way of engineering that can let us ship more, smaller open software projects, rather than focusing on one top-down product.</p><p>We want you to be able to propose projects that you would be the DRI for. These should be projects that you think would impact our mission of making tech serve humans, and to have a way for some of those projects to become real. We want project DRIs to be <em>fully</em> empowered to make (or delegate) every decision related to their project. We want all decisions, all responsibility to be owned by <strong>you</strong>.</p><p>We&#8217;re doing this because we believe smaller teams working on projects built and shared in the open will do a better job toward accomplishing our mission. We want to be building projects that are used by as many people as possible, and that help promote this vision of what we want to see in the world:</p><ul><li><p>Empowering people</p></li><li><p>Catalyzing agency</p></li><li><p>Democratizing power</p></li><li><p>Promoting an ecosystem of open software, open agents, and open data</p></li></ul><p>We want to build an ultra-high agency culture and organization, together.</p><p>We want to do this for lots of different reasons: just to learn if it is possible, and what it looks like, and because we think this is a more effective&#8212;and more fun&#8212;way of being in the world.</p><p>But the core reason we want to make Imbue ultra-high agency is because <strong>we cannot run it any other way</strong>. It&#8217;s just not authentically who we are as leaders, and we refuse to do that any more.</p><p>Fundamentally, what Kanjun and I both realized over the break is that we already <strong>know</strong> what kind of culture and company and products and projects we want to build and see in the world. No one is stopping any of us from being the company that we want to see. We can just do that today.</p><p>Today is a new year, and it&#8217;s a new dawn for Imbue, and for all of us.</p><p>What does that mean in practice? That&#8217;s for us to figure out, together.</p><div><hr></div><p><em>If you&#8217;re interested in joining our team to create a company and world that stands for radical human agency, <a href="https://imbue.com/careers/">we&#8217;re hiring</a>!</em></p>]]></content:encoded></item><item><title><![CDATA[Empowering humans in the age of AI]]></title><description><![CDATA[We founded Imbue in 2021 to build an AGI future where humans remain at the helm, shaping powerful AI systems rather than being subordinated to them.]]></description><link>https://ideas.imbue.com/p/empowering-humans-in-the-age-of-ai</link><guid isPermaLink="false">https://ideas.imbue.com/p/empowering-humans-in-the-age-of-ai</guid><dc:creator><![CDATA[Kanjun Qiu]]></dc:creator><pubDate>Sun, 18 Jan 2026 06:04:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uJAg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uJAg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uJAg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uJAg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uJAg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uJAg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uJAg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:941984,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://imbueai.substack.com/i/184930971?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uJAg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uJAg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uJAg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uJAg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F275d8acb-1ed0-4860-8c9e-fa6126e6e290_1200x675.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We founded Imbue in 2021 to build an AGI future where <em>humans</em> remain at the helm, shaping powerful AI systems rather than being subordinated to them.</p><p>We believed that AI could multiply productivity and help everyone prosper. But as capabilities began to accelerate, a nagging worry set in: we felt forces dragging us all toward a future where powerful AI capabilities become concentrated in the hands of a few, giving them outsized control over what our chatbots say, what our AI agents do, and ultimately what institutions, societies, and lives we build. In this future, most people quietly become less free.</p><p>We&#8217;ve now come to understand something crucial: <strong>the core challenge of AI lies in managing how it shifts </strong><em><strong>power</strong></em><strong>.</strong></p><p>AI makes software systems dramatically more powerful. That power flows, by default, to those who can build and own those systems. This concentrates power, which leads to exploitation and disempowerment. We&#8217;re starting to see it play out today: tech platforms use basic AI agents in the form of recommendation algorithms to hijack our attention, creating addictive experiences we simultaneously crave and resent. We have little control over how these algorithms operate &#8212; and they often don&#8217;t work in ways that benefit us.</p><p>At Imbue, we initially thought creating AI agents that helped people automate computer tasks would naturally distribute power. But over time, we began to see how this risked undermining people in a way similar to tech platforms: AI agent builders control the algorithms that make decisions about their users&#8217; lives, and their incentives may not be aligned. Agents optimized for profit or engagement might nudge us toward buying sponsored products that advertisers want, gain access to trusted data because it&#8217;s lucrative to monetize, or manipulate our emotions to keep us engaged.</p><p>Instead of locking people into centrally-controlled agents, we had to rethink our approach: to genuinely put power back into people&#8217;s hands, we had to equip everyone to create, customize, and truly control their own AI tools.</p><p><strong>Imbue&#8217;s mission is to empower </strong><em><strong>humans</strong></em><strong> in the age of AI by creating powerful computing tools controlled by individuals.</strong> We believe this requires a shift in the philosophy of building AI &#8212; not selling AI software designed to serve the interests of its creators, but instead helping us all make AI software that&#8217;s tailored to our own goals and values. For example, I&#8217;d love an agent that helps me track and participate in local ballot measures. Or a personalized feed that curates only the most important news to protect my attention from the latest flashy headline. Or an app that protects my grandmother from scam calls in Chinese.</p><p>Critically, this kind of software remains accountable to individuals and communities. Like the right to vote in a democracy, we believe the ability to create, modify, and control AI software and agents gives people a voice in this era of powerful AI &#8212; so that humans can be actors, not acted upon. In an era racing to make machines that replace humans, we want to reveal the builder in every human.</p><p>Today, we see glimpses of this possibility: AI coding tools seem tantalizingly close to letting anyone build software simply by describing it. But that initial spark of creativity burns out quickly when we try to really use these apps and we discover how flimsy, difficult-to-extend, and unmaintainable they are. We hit a ceiling because AI coding tools struggle to fix bugs without creating more of them, or to add new features without breaking old ones.</p><p>But <em>humans</em> know how to build complex software &#8212; we&#8217;ve spent decades developing best practices for architecting reliable systems, testing, and managing changes. Today&#8217;s LLM workflows rarely incorporate these practices. But what if they could? We&#8217;re trying to create a better way to build software with AI &#8212; one that embeds engineering best practices directly into AI-assisted software development, so more people can easily create robust, dependable software for themselves and others.</p><p>Our initial product is a coding agent environment that helps engineers write healthier code faster with LLMs by making it easy to encode best practices, identify and fix issues, and test and run LLM-generated code safely.</p><p>You can see more details and <a href="https://imbue.com/sculptor">try it out here</a>.</p><p>Ultimately, our goal isn&#8217;t just to help engineers, but to embed our collective knowledge of software craft into an open environment of coding agents that invites much broader participation. When we imbue engineering best practices into software creation tools, it becomes much easier for <em>anyone</em> to build sophisticated software, opening the door for many more people to participate in the AI future. Instead of waiting for companies to build for us, we&#8217;ll be able to make our own idiosyncratic tools for ourselves and our communities.</p><p>When we can create and control AI software and agents, power shifts. At the most basic level, being able to build our own interface to services lets us resist algorithms and platforms we currently cannot opt out of; I&#8217;d make my own Twitter feed optimized for thoughtful topics and friends, rather than one built to provoke me. When we can pivot to our own solutions, platforms are forced to better serve our interests to keep us engaged. The same dynamic applies when faced with other entities&#8217; AI agents that try to influence us &#8212; for example, we can build our own filter agents that block spam or unwanted messages to safeguard our interests.</p><p>To help level the playing field, we also need laws and societal structures that let the agents we build be as powerful as those controlled by companies with lots of data (for example, by letting us get our own data out of corporate silos), and that protect us from other entities&#8217; agents when they impinge on our freedoms (for example, by trying to addict us, warp what we believe to be true, or subtly shape our decision to buy something). This is the core of our policy work at Imbue: to safeguard individual rights in an increasingly automated world and uphold democratic principles against power concentration.</p><p>Technology&#8217;s highest purpose is not to replace human capability, but to amplify what is already inside us. We believe creative potential lies within every person, waiting to be unlocked. The world&#8217;s most meaningful software is still trapped in the human imagination, locked behind the barrier between what billions of people can imagine, and what they can actually create.</p><p>And if we free it, we can create something better than our current trajectory: a world more democratic, more open, more free. A world where we imbue machines with our will to shape our lives and institutions, where we collectively direct our agents toward solving what matters most to us.</p><p>Instead of AI replacing humans, we can use it to nurture what is best within each of us: the capacity for creation, connection, joy, beauty, awe.</p><p>This is the human future we can fight for. This is what AI ought to be for.</p>]]></content:encoded></item><item><title><![CDATA[Malleable software and human agency]]></title><description><![CDATA[A conversation with Geoffrey Litt, design engineer at Notion, on shaping software like clay]]></description><link>https://ideas.imbue.com/p/geoffrey-litt</link><guid isPermaLink="false">https://ideas.imbue.com/p/geoffrey-litt</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Fri, 14 Nov 2025 23:24:09 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/178820393/2a8c10a609748eddd1499f6b8c56900e.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em><a href="https://www.geoffreylitt.com/">Geoffrey Litt</a> is a design engineer at Notion working on malleable software: computing environments where anyone can adapt their software to meet their needs and their lives. Before joining Notion, he was a researcher at the independent lab, <a href="https://www.inkandswitch.com/">Ink &amp; Switch</a>, where he explored the future of computing. He did his PhD at MIT on programming interfaces. Most of his work circles around a very simple but powerful question: how can everyday people shape the software they use like clay so that humans can have more power and agency in the world?</em> </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://ideas.imbue.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe for conversations with builders and thinkers on what future we want, and how to build the technologies and systems to get us there.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>In this conversation, Geoffrey and Kanjun discuss:</p><ul><li><p>Barriers to malleable software</p></li><li><p>Inventing new UI components for the AI age</p></li><li><p>Principles for agent-human collaboration</p></li><li><p>How AI affects the creative process</p></li></ul><p>&#8230;and more!</p><div><hr></div><h2>Timestamps</h2><p>05:59 Barriers to software malleability: technical, economic, and infrastructural</p><p>08:57 Real-time collaboration and version control</p><p>15:01 Common Source: between open and closed source</p><p>20:54 Navigating divergence in software development</p><p>34:04 Data structure and universal formats</p><p>39:10 Local developers and collaborative software</p><p>42:57 Learning curves and tailorability in end user programming</p><p>50:55 How AI shapes creative work</p><p>52:07 Making agent-human collaboration like human-human collaboration</p><p>01:03:44 Mental bandwidth and parallel agents</p><p>01:08:50 Exploring design spaces through generated options</p><p>01:11:11 Visualizing code quality and malleability</p><p>01:13:45 Review as part of the creative medium</p><p>01:17:59 Infrastructure needs for malleable personal software</p><p>01:30:47 Rekindling the vision of personal computing</p><div><hr></div><h2>Transcript</h2><p><strong>Kanjun Qiu (00:30)</strong></p><p>Welcome back to Generally Intelligent, a podcast by Imbue on the economic, societal, political, and very human impacts of AI. Today, I&#8217;m joined by Geoffrey Litt. Geoffrey is a researcher working on malleable software, computing environments where anyone can adapt their software to meet their needs and their lives. Geoffrey&#8217;s just joined Notion, and recently he was a researcher at the independent lab, Ink &amp; Switch, where he explored the future of computing.</p><p>He did his PhD at MIT on programming interfaces and most of his work circles around a very simple but powerful question, which is how can everyday people shape the software they use like clay so that humans can have more power and agency in the world? And that&#8217;s a lot of what we&#8217;ll be exploring in our conversation today.</p><p>Welcome, Geoffrey. It&#8217;s really good to have you here. So I&#8217;m really curious, we always start with, tell us a bit about how you developed your initial research interests. You you went to MIT, you did your PhD in human-computer interfaces. What sparked your interests? What happened and how did your thinking evolve over time?</p><p><strong>Geoffrey Litt (01:30)</strong></p><p>Thanks, it&#8217;s great to be here.</p><p>Way before I got into research, actually, I was just working on a startup shipping product. I worked at an edtech startup out of college. And that was where this all kind of started. We were a team in Boston shipping software to thousands of schools across the country. And every school is different, right? From time to time, we would try our best to make the one best report for data that works for every school, whether it&#8217;s a rural elementary school or an urban high school or whatever. And then we would get on these calls and some teacher or principal would be like, you know, actually, I don&#8217;t use your product. I just hit export to CSV and then I use Excel. And it was really sad for me as a designer. But then you look at what they did and it&#8217;s like, oh man, this is ugly, it&#8217;s buggy, but it does exactly what you wanted. And sometimes what they would change would be the tiniest thing. Like, I didn&#8217;t like that color, that color made our kids feel bad. Or like, this word in your product touches a political nerve. It could be tiny, tiny details, but having the people on the ground in the classrooms having the agency change that stuff was really interesting. This aspect of spreadsheets that just kind of captured me. </p><p>And so I started thinking, why doesn&#8217;t more software feel that way? Why is it that, you know, there are some things you can do in Excel, but so much of software feels like it&#8217;s decided thousands of miles away and you&#8217;re stuck with however it was decided, you know?</p><p>That sent me down a really, really deep rabbit hole trying to figure that out. And that&#8217;s how I got into this question.</p><p><strong>Kanjun Qiu (03:19)</strong></p><p>That&#8217;s really cool. Yeah, I&#8217;ve heard from someone, like, Excel is the first and maybe only successful end-user programming tool. And you have all of this interesting use of Excel, but that doesn&#8217;t work for everything else. When you were exploring this question, why isn&#8217;t all software that way? Where did that lead you? Why isn&#8217;t all software that way?</p><p><strong>Geoffrey Litt (03:41)</strong></p><p>Oh man, yeah. There&#8217;s a lot of reasons, some of them are technical. I think historically, a lot of people have tried to make programming easier and more accessible, but ultimately there have been these kind of barriers of needing to think in really abstract ways that are not natural to most people. So that&#8217;s been one chunk of the challenge. And I think AI is changing that state of play a lot, and we can talk about that. </p><p>But there&#8217;s also lot of other barriers that are bigger and in some ways harder to tackle. There&#8217;s economic barriers, like, you know, how do people get paid to make stuff? There&#8217;s kind of like infrastructural ones, like a lot of our computing environment and ecosystem has kind of calcified around the assumption that people aren&#8217;t editing their software. </p><p>If you think about it,when I send you an Excel spreadsheet and you open it, you&#8217;re opening it in the editor for the spreadsheet. It&#8217;s not just a spreadsheet viewer, right? You actually have the editor and you can do whatever you want to it because it&#8217;s a file that you control.</p><p>And when we look at a lot of how software is shipped through app stores, the assumption is that, no, what are you talking about? The user would never edit the code. In fact, we do a lot of things so that they can&#8217;t edit the code. And so I think there&#8217;s a lot of factors that interlock around this core assumption. And that&#8217;s part of what I think makes this problem challenging to make progress on, is that you have to address a lot of these together.</p><p><strong>Kanjun Qiu (05:03)</strong></p><p>That&#8217;s a really interesting observation that Excel is the editor and most software isn&#8217;t. You open the software, it&#8217;s view only, it&#8217;s not the editor. You maybe can edit the data, but not the UI elements.</p><p><strong>Geoffrey Litt (05:15)</strong></p><p>Yeah, I think that&#8217;s one of the biggest principles around malleability is we want to be removing friction and barriers between being a quote-unquote user who&#8217;s passively using something and getting deeper and deeper into actively modding it. A really important point is that I don&#8217;t think that everyone should be modding software all the time. I&#8217;m a nerd, and I don&#8217;t want to mod most of my software most of the time, right? But it&#8217;s just about having the ability to go there if you want.</p><p>That&#8217;s where, you know, in an environment like Excel or spreadsheets, having the editor at least available to you always is a key principle.</p><p><strong>Kanjun Qiu (05:53)</strong></p><p>Right. Because sometimes you only want to view the Excel spreadsheet and not change the super complex financial model.</p><p><strong>Geoffrey Litt (05:59)</strong></p><p>Exactly. And there might be cells that say, don&#8217;t touch this unless you&#8217;re really sure you know what you&#8217;re doing. I think that&#8217;s something that people often miss too, is sometimes having more explicit guardrails can actually free people up to feel safer and more creative editing stuff. If we go back in the history of malleable environments, <a href="https://hypercard.org/">HyperCard</a> is a system from, I think it started in the 80s and shipped on Macs. And basically, it was kind of like a precursor to PowerPoint in a way. You could kind of make these kind of like slideshows with these index cards basically. But what was really neat about HyperCard is that you could start out by just drawing pictures or writing text and they had these different levels or modes and level one or level two I think was like just editing text and drawing stuff. You weren&#8217;t even able to code in that level. And then when you wanted to, you could go to level five, let&#8217;s say, which was the deepest level where you can do anything, but you&#8217;re only going there if you know you&#8217;re ready and you know you want to.</p><p>And I think in a lot of spreadsheets, you actually have folk practices around this stuff, like you just mentioned where maybe you&#8217;re walling off part of it that&#8217;s dangerous to touch. I think sometimes paradoxically, boundaries like that can create freedom for people.</p><p><strong>Kanjun Qiu (07:15)</strong></p><p>That&#8217;s super interesting. Diving into that a little bit and as part of the infrastructural barriers you were talking about, like our computing ecosystem, maybe the infrastructure we have, the UI elements that we have, they are all calcified around this assumption that people are not editing their software. What kind of infrastructure, constraints, different UI components, guardrails, et cetera, do you think could&#8230; let&#8217;s say we rewound back to the 80s or, you know, we&#8217;re here today and we ended up calcifying around a different ecosystem. What elements of that ecosystem might exist such that you end up getting malleable software?</p><p><strong>Geoffrey Litt (07:57)</strong></p><p>Yeah, so maybe it&#8217;s best to talk about this concretely. I can tell you about some experiments we&#8217;ve done at Ink &amp; Switch where I used to work and do research and some of the environments that we developed there that we used heavily internally to do our own work that kind of enabled the sorts of malleability we were seeking. One system that we developed really deeply was a system called <a href="https://www.inkandswitch.com/patchwork/notebook/">Patchwork</a>. And the core idea of Patchwork was basically, it starts out as just a document editor, like the flagship feature is just a markdown editor that&#8217;s collaborative. But then you can go deeper and deeper into modifying it. And what it ends up being is actually an environment where you can make your own tools and share them with people and edit whatever tools you&#8217;re using on the fly. It kind of achieves some of the goals we wanted. </p><p>So how do we get that? Well, a few things. You need the ability to live edit your tools as you use them. This is a really important thing we realized is most of the time when I realize I want to change something, I don&#8217;t have an hour to go do it. I might have five minutes though. And so we found that there&#8217;s this kind of magical combination of, use AI to do the coding, so that solves how do you get the new code. But then you also have this question of like, how do you ship that to yourself and to your colleagues? And the starting point there is really treat the code of your app just like the documents you&#8217;re editing together. What I mean by that is like when we open a Google Doc together, we&#8217;re just editing it live, right? Just make your software like that.</p><p>This is not how people typically think. Typically you think, you have to push to GitHub and there&#8217;s some CI pipeline that runs and it deploys and it&#8217;s like this industrial process that&#8217;s arranged around kind of like preventing screw ups and shipping to millions and millions of people. But no, just make it like Google Docs, okay? Once you do that, you instantly run into a bunch of other problems. So one is, if we&#8217;re using a piece of software and you&#8217;re editing it live, it&#8217;s not going to be fun for me because you&#8217;re going to be breaking it all the time, right? So then we realized, actually, this is why programmers have Git. We need Git for normal people. So we invested a lot in Patchwork and ideas around version, we call it <a href="https://www.inkandswitch.com/universal-version-control/">universal version control</a>. The idea is systems that achieve the goals of programming version control, but for any kind of data and for any kind of user, even someone who&#8217;s not super, super nerdy. Basically, what we found was that when you combine code as just documents that you&#8217;re sharing with people and you can edit them, you have AI helping you, and you have powerful version control that lets you create copies of things and merge things back together in good ways, then you start getting a really interesting set of ingredients where you can start remixing and mashing up and feeling more playful with your tools and your software. So I think that&#8217;s one starting point.</p><p><strong>Kanjun Qiu (10:47)</strong></p><p>That&#8217;s super interesting. Treating code as a live shared document, like text file, between two people, and then having a way to version that text and data, I imagine, as kind of the main things that you&#8217;re versioning. </p><p><strong>Geoffrey Litt (11:03)</strong></p><p>Those are two things. </p><p><strong>Kanjun Qiu (11:05)</strong></p><p>What else do you need to version?</p><p><strong>Geoffrey Litt (11:19)</strong></p><p>Patchwork has whiteboards. Patchwork has spreadsheets. And in fact, because you can add new tools to the system too, actually you end up with kind of the ability to store like the system needs the ability to store and share arbitrary data. This is another thing I&#8217;ll get into, which is when we think about, OK, what are the barriers to shipping software? A lot of the barriers are that the main ways that we deploy software for people to use assume industrial scale. So you need a back end. You need a database. You need load balancing, blah, blah, blah, nonsense, Kubernetes stuff.</p><p>There&#8217;s a lot of stuff and the gap between I have a working prototype that I can run on my computer and I can send you a link and we can collaborate in my new piece of software I made tends to be a lot of work. And so one of our goals in Patchwork was how much of that infrastructure could be offload to the operating system or the environment, so to speak, so that if you have an idea and you vibe-code a UI, how can you then share that with me and we&#8217;re instantly working together in that tool you just made?</p><p>And there&#8217;s a lot of layers to kind of figuring out data persistence and sync and all that stuff to make that a reality. I think you&#8217;re seeing this to some extent in a lot of platform-as-a-service startups out there are trying to figure out how do we become the best backend for vibe-coded apps in a sense. I think that&#8217;s part of it, but I think we can push even further than most startups are going.</p><p><strong>Kanjun Qiu (12:40)</strong></p><p>Yeah, I think one of the really interesting things here. So Imbue recently shipped Sculptor, which is a tool for you to run parallel agents to write code. And one thing that we&#8217;ve been thinking about is sync and collaboration, real-time collaboration. And something that we made is this thing called Pairing Mode. So all of the agents run in containers, which means they don&#8217;t have your code. They&#8217;re not running locally. They run in containers because you don&#8217;t want them to delete your files accidentally or things like that.</p><p>I actually really resonate with what you said about versioning. We&#8217;ve had to think about the Git workflow of the developer and the agent as two separate things. And how does the agent&#8217;s version and the developer&#8217;s version mesh together? We&#8217;re just using Git right now, but Pairing Mode basically copies the agent&#8217;s files and rsyncs it to your local environment. And then now you&#8217;re real-time editing with the agent. What you said is really interesting, because we&#8217;ve been thinking about, like, I&#8217;m real-time editing with the agent, but actually sometimes I want to real-time edit with someone else. In the normal existing today&#8217;s software engineering industrial process, nobody wants to real time edit with anyone else. That&#8217;s actually really rare. So it&#8217;s been an open question for us. I resonate with you a lot on things built for scale, Kubernetes, actually wouldn&#8217;t it be better if everything were local and just running on the compute that you have in your hands and you don&#8217;t have to handle all these scale problems. </p><p>I&#8217;m really curious, when you were working on Patchwork together, when did you want to collaborate real-time when coding versus when you want to do this more like industrial, independent two people merging branches into the same main workflow? Did you ever want to do one versus the other? Or did you always want to live edit?</p><p><strong>Geoffrey Litt (14:33)</strong></p><p>Yeah, I&#8217;m really glad you asked. I&#8217;m fascinated by this area. I have this opinion that collaboration between humans and AI is essentially a version control problem. What I mean by that is when you think about the problems that a version control system like Git is meant to solve, you have a bunch of people working together. They might be working concurrently on different stuff. And you need ways to go off and try stuff and be experimental. You need ways to review work that other people are coming to you with and talk about, I want to do this, what do you think? Let&#8217;s go back and forth and discuss. And then you want to track, okay, we decided it&#8217;s good, let&#8217;s do it. And you want to see that in your history. And when you think about working with AI actually, and you look at the needs, a lot of these things map really directly. So I have an unreliable alien intelligence out there doing stuff for me. How do I know if I like it? I need some way to review what it did. I need some way to talk with it and with other people about what it&#8217;s proposing. And then when we like it, you know, we can accept it.</p><p><strong>Kanjun Qiu (15:35)</strong></p><p>Like accept its changes and discard the changes that don&#8217;t matter.</p><p><strong>Geoffrey Litt (15:38)</strong></p><p>Exactly. And I think actually one of the underrated reasons that coding has taken off as a use case for AI actually is the prior existence of mature tooling, like pull requests, for doing this workflow. I think in a lot of other domains, if you don&#8217;t have this stuff built up yet, you can&#8217;t just let an AI agent go do stuff to a really important shared workspace without any ability to see what it did, talk with it about what it&#8217;s proposing.</p><p>There are ceilings on what you can do there, right? And I think the more version control you have the more you can just kind of let the agent go do stuff. And so I think it&#8217;s a fascinating area. Now to get to your question, I would say when working on Patchwork, we mostly weren&#8217;t live editing together and coding. We were probably mostly actually working async. But definitely we were leaning heavily on branches. And I think, you know, what you were talking about reminded me of, I think a lot of products are struggling right now to reconcile the old way of Git with the new requirements. Parallel agents, more real time stuff. And I think it&#8217;s going to be interesting to see what does it look like. Do we reinvent version control from scratch for the new requirements? Do we layer on top of Git as a lot of products are doing?</p><p><strong>Kanjun Qiu (16:59)</strong></p><p>One thing that I&#8217;ve been sitting with is on the idea of version control. This may be not obvious from our website, but at Imbue, we really care about making software modifiable by the end user, because we think that basically it&#8217;s a question of control as we go into this AI future. Like today, we&#8217;re kind of controlled by our software, actually. Our attention is controlled. Our actions are controlled. We&#8217;re controlled by other people who are building these systems. Sometimes inadvertently, they&#8217;re trying their best. Sometimes very explicitly, they&#8217;re trying to maximize profit or engagement. AI makes this problem worse. But it also gives us opportunity because AI can write code. So a question I&#8217;ve been sitting with to your point&#8212;you mostly weren&#8217;t live editing with Patchwork, you are mostly working async, but you also want to be able to change things. And maybe sometimes those changes make it back in, and maybe sometimes they&#8217;re just for your local system. I think it&#8217;s really rare, systems like this, except for open source projects, not many systems like this exist. In your lived experience, why did you build live editing if you mostly weren&#8217;t using it? What was it? I feel like there&#8217;s something interesting in live editing and I don&#8217;t fully understand what it is and I&#8217;m really curious for your thoughts.</p><p><strong>Geoffrey Litt (18:30)</strong></p><p>Oh man, I think there&#8217;s like three separate topics to unpack there. I&#8217;ll start with the last one. So why live editing? I think it&#8217;s just what people expect. In some sense, it&#8217;s the most straightforward model. We get on a link, we&#8217;re looking at the same thing. Every kid expects that now in all of their software. They don&#8217;t know what files are, they don&#8217;t know about emailing. It&#8217;s just, everything&#8217;s live. And I actually think that&#8217;s a really lovely starting point for remote collaboration. When we get on a whiteboard, we can just draw. It feels really fluid and nice, you know? My view, and I think what we explored largely at Ink &amp; Switch is like, it&#8217;s a yes and where you want that and you want the ability to go off in a corner and think about something privately without having your manager come in and stare at you, right? We call this creative privacy. I did a bunch of user interviews with writers talking about they feel observed in Google Docs basically, right? And so I think that&#8217;s the simple answer is that live editing is how the world works now. And so we got to meet people where they are. </p><p>I want to get back to something else you said, though, which is about this question of values and what software is trying to do to us, essentially. And I think that is a deeper undercurrent of malleability that we haven&#8217;t really addressed yet. </p><p>Cory Doctorow has this phrase, adversarial interoperability, which I love. He talks about things like ad blockers that are browser extensions, right? What&#8217;s happening there is that there&#8217;s this adversarial relationship where a website&#8217;s trying to push ads on you and you&#8217;re pushing back and using this technological capability to basically set up an environment that&#8217;s more in keeping with the way you want it to be or your own values. I think ideas like Bluesky algorithms being less centralized are also in this vein. And I think that is a very important part of the equation to consider when we think about barriers.</p><p>There are incentives that big corporations have to not let us change stuff because that&#8217;s how their business works. One analogy that I sometimes like to use is it&#8217;s more of a food court than a kitchen. There are these big companies that have their own agendas pushing a menu of choices at you. And in your kitchen, you have a lot more control over what am I trying to do with my food? What cuisine style, what health criteria am I trying to meet? And you have more of an ability to mold it to be in keeping with your values. So I think of the software app stores as kind of these food courts. I think that&#8217;s another big piece we have to solve.</p><p><strong>Kanjun Qiu (21:09)</strong></p><p>I agree. Yeah, it&#8217;s really resonant because Glenn on our team, he&#8217;s a prototype engineer, and he wrote about how it feels like we are in a world of vending machines right now. We get all these vended products, but in a truly open kitchen, we can change the kitchen layout itself and cook the food that we want. Earlier you talked about three barriers: technical, economic, infrastructure. We started out talking about infrastructure, but the economic barriers are ones that we think about a lot.</p><p>I&#8217;m curious at Ink &amp; Switch and I&#8217;m happy to talk more about how we think about the economic barriers, but at Inc and Switch, did you think at all about the economic barriers or like kind of what&#8217;s your perspective on that?</p><p><strong>Geoffrey Litt (21:53)</strong></p><p>Frankly, we mostly didn&#8217;t yet, I would say. We were focused on how can we make a really awesome, malleable system that we wanted to work in. I think in some ways, the economic barriers are some of the hardest ones to work on in a research context, because I think ultimately companies with commercial incentives have to solve the business model piece. And I think my view of the world is that the technical and infrastructural barriers are big enough that they still really matter and researchers can make progress on that piece somewhat separately. I don&#8217;t know, I think the thing that comes to mind for me is I once did a deep dive into this system called OpenDoc, which Apple had in the early mid-90s, which is a cousin of a related system on Microsoft called OLE. And the idea was it was very malleable software-esque. </p><p>You could have these mix and match widgets in your documents instead of monolithic applications. And you could kind of buy these smaller widgets from companies and combine them with your existing software. And apparently one of the challenges they hit was, when something breaks, who do you call? A really nice thing about applications is there&#8217;s a box on your screen. If something&#8217;s wrong with that box and you have an enterprise support contract, call them and they&#8217;re on the hook. And the more you break things down into small units, you know, there&#8217;s basic questions of like, are you willing to pay for a tiny, tiny feature on its own and have a separate procurement for that? But also, who&#8217;s on the hook for integration work, you know? A lot of users value things working and paying for that. So I think those are some of the big challenges.</p><p><strong>Kanjun Qiu (23:41)</strong></p><p>Mm-hmm. Yeah, one of the things that we&#8217;ve been thinking about is, what LLMs do is they make code easy to write and replicate. In theory, in theory. At some point they will. And so to your question of like, you know, how do we get malleability but also software that people support? I think there&#8217;s actually some interesting space between closed source software that people pay for and open source software that is fully volunteer supported. Because kind of the point behind malleable software, one of the requirements is that you need to be able to modify the source code, probably. And so in that sense, malleable software has to be open source by default, or source available by default. But today&#8217;s open source environment is like free. There&#8217;s free software. And so like who&#8217;s going to support it? It&#8217;s like a team of really overworked developers, and they&#8217;re like maintainers of this project, and they&#8217;re all volunteers, and that sucks.</p><p>And so I think we&#8217;ve been playing with this idea we&#8217;re calling Common Source, which is between open source and closed source, and this idea that actually probably most of the important software we run should be run from a public commons of code, of common source code. And in Common Source, what we&#8217;re toying with is this idea of a license that actually you can get the source code, but you still have to pay the creator or the group of people who are creating the code. And so then that starts to answer some of these questions potentially of like, OK, well, you&#8217;re paying for maintenance, really. You may be paying a SaaS fee for maintenance and getting the things you want. Stuff breaks. You can stop paying them. So the incentives are aligned in this way. But at the same time, you still get the source code, so you can make your own changes. And if you diverge too far from the original project, well then maybe they can&#8217;t help you anymore. I think we have to make some changes to our assumptions around open source and the philosophy behind open source to get to valuable software.</p><p><strong>Geoffrey Litt (25:51)</strong></p><p>That&#8217;s a really fascinating idea. I love that. I totally agree that open source seems to be a natural prereq and that it raises these questions. I think it&#8217;s tricky because I love that perspective you bring. At the same time, I think the history of open source business models has been fraught with a lot of failures, and when we think about, okay, code is now much easier to copy. I mean, probably if you have the code, you can easily make a copy that is legally different, so I don&#8217;t know, it seems tricky.</p><p>I also think to your last point around, you know, divergence, I think this is a huge, huge challenge to figure out. If there is an ongoing software project that is shipping updates and I have my own version of it where I did my own thing, I mean, software developers know this as sort of like the fork maintenance problem, and it can be a huge pain in the ass depending on what you&#8217;re doing.</p><p>There are companies that maintain forks that have teams of engineers just keeping up with what&#8217;s happening upstream, so to speak. And I think this is something that I&#8217;ve thought a lot about in malleability. I think the root problem is if you treat diversions as kind of arbitrarily editing the code in any way, the problem of fork maintenance is really hard. Whereas there are other ways to factor it out. Like if you have plug-in APIs, you can say, okay, anyone can make a plugin, and we&#8217;re gonna try to keep this plugin boundary stable is one way to do things. Now, there are trade-offs, like often there are things you can&#8217;t do through the plugin API, so you wanna dig deeper. I think it does get tricky, but new ways of organizing software to be modular and compositional in different ways can lead to different abilities of people to mod it. </p><p>Something I&#8217;m really curious about is, if we progress towards a world where you have a lot of AI coding happening and you have people wanting to maintain forks with heavy divergence, maybe we just start structuring our code bases differently to treat that as the number one goal, basically. There have been some wacky research systems that have thought about programming with this as the number one goal and they get to very different structures than we&#8217;re used to. There&#8217;s a great idea called behavioral programming from David Harrell where basically his idea is what if a program is like a rule book, just a list of rules. And you just add rules and rules can cause exceptions to previous rules. So I might say, the red square can always move to that square, but then you could come along and add a rule that&#8217;s like unless that thing says five. And then you see how like we just keep adding and adding and adding to this ball more and more rules. And we never have to reach into existing rules and modify them. Maybe there are ideas like that that could change the game.</p><p><strong>Kanjun Qiu (28:47)</strong></p><p>That&#8217;s really interesting, so append only as the solution to divergence. So you actually don&#8217;t diverge. Yeah, that&#8217;s interesting. What other ideas are there around divergence?</p><p><strong>Geoffrey Litt (28:57)</strong></p><p>Another inspiration, there&#8217;s a common pattern in software of middlewares where you basically stack up these layers and you can always add more. I think maybe it&#8217;s the same principle in the end of just like the more that you can have additive modification without reaching in to touch the existing stuff, the better. I also frankly would just throw the AI hammer at it and say to some extent when you reach in and intrusively modify something, it&#8217;s gonna get messy, but probably 80% of the fixes that happen in fork maintenance are routine and don&#8217;t require anyone to think that much. They&#8217;re just icky. And so I&#8217;m very optimistic that AI will get to the point where it can mostly automate the easy stuff and then at least only leave the tricky hard stuff.</p><p><strong>Kanjun Qiu (29:44)</strong></p><p>Mm-hmm. Yeah, that&#8217;s really interesting. Principally, when it comes to handling divergence, we really have two options, I think. Okay, conjecture, made this up on the spot. So principally we have two options. One is maintain the internals of the system and then add more stuff such that the end behavior changes, so rules create exceptions to previous rules and the internals don&#8217;t change. Or like middleware, or more layers of abstraction also kind of does the same thing. You don&#8217;t really change the underlying stuff, but you&#8217;re adding more abstractions on top. Now you can do different things and more things. Plugins are another similar thing. Like you&#8217;re not changing the center, but you&#8217;ve got this API and you can add stuff. So, one paradigm is like don&#8217;t change the middle to deal with divergence, actually just have modular pieces on top. And then the second paradigm is actually, change the middle, and then use AI to solve it somehow. Something that we do a lot internally with Sculptor is we do test-driven development, where we write a bunch of tests. And it&#8217;s not magical like this yet, because when we try to write the test, the tests actually don&#8217;t capture the full behavior of the system. But in theory, you would have tests that capture much of the behavior of the system, kind of do a full refactor rewrite of the middle and then have it abide by some rules. And then that does let you modify the center.</p><p><strong>Geoffrey Litt (31:17)</strong></p><p>I like those options. I&#8217;ll throw in one more complication maybe, which is that once you&#8217;re talking about collaborating on shared software, think things get more essentially complicated. Single player software, whatever, I can have my own weird version and you don&#8217;t care. As long as I have a smart enough AI to keep up with updates, it&#8217;s not your problem. But now imagine we have team software.</p><p>So now it&#8217;s fundamentally a different problem where there are compromises that have to be made. People have to have shared practices around working. I&#8217;m fascinated by the question of how far can different people&#8217;s tooling setups diverge while still retaining the ability to collaborate and what kind of layering promotes that. Concrete example, many software engineers have a preferred development environment. And when you join a software team, you get to bring your favorite editor, typically.</p><p>And that works because code is stored in this very universal plain text file format. There&#8217;s a universal version control layer that most people use Git or whatever. You pick your system. And that&#8217;s just a file-based thing. So then whatever tools you want to use to edit your files, whether it&#8217;s Sculptor or like Vim or whatever, not my problem, right? And so there&#8217;s this really nice distraction boundary and we can still work together. That&#8217;s first of all, is not the case for most SaaS software.</p><p>There&#8217;s a deep, deep coupling between the data that you&#8217;re sharing with your teammates and the one editor that is allowed to edit that data. And secondly, I think it&#8217;s often tricky to even tell like, where could we draw that boundary? Could you use like Asana and I use Trello? Would that work? Could we sync them? I don&#8217;t know. There&#8217;s probably stuff that doesn&#8217;t fit, right? At Ink &amp; Switch, we did this project called Cambria where we kind of took on this challenge at the data layers. We thought about if you were synchronizing data across really different apps, which want to store their data in different shapes, could you make some sort of glue that shuffles the data back and forth live as people collaborate? So you&#8217;re always kind of seeing as much as possible on both sides, even if it&#8217;s not 100%. I think there&#8217;s a lot to consider there.</p><p><strong>Kanjun Qiu (33:36)</strong></p><p>That&#8217;s super interesting. This is super interesting because on Sculptor, we&#8217;ve been thinking about apps as being separate from data. Like code is not data. And actually data has to be treated fundamentally differently. And with Cambria, you&#8217;re kind of like synchronizing data across these really different apps. And one thing I&#8217;m curious about, this question of like, can I use Asana and you use Trello is what is universal about data? Is a Postgres database that is structured with infinite columns somewhat universal? Is it documents that are universal with plain text? In all of your experimentation, what have you learned about data?</p><p><strong>Geoffrey Litt (34:29)</strong></p><p>This is a fantastic and very difficult question. What is the elemental material that if we just stored everything in X shape, then everything would work? I don&#8217;t think there&#8217;s silver bullet, unfortunately. I do think, though, the essential quality to think about is how structured and specific the data representation is. The idea of files generally is a pretty low level abstraction. It&#8217;s really just a sequence of bytes. That&#8217;s all you know. But you can layer ideas on top of that, like file formats, which have their own constraints, right? You can store like, for example, JSON as a file, which then adds more constraints, but JSON is also pretty general. And then you could say, I have this JSON schema, which says like only JSON of this shape. And I think you can have this progressive more and more specific layers and you can&#8217;t get everyone to agree on really specific schemas. It&#8217;s never gonna happen. At the same time, really, really low level abstractions, like, it&#8217;s just a sequence of bytes, good luck, are very open-ended and I think allow people to do too much different stuff that make it hard to work together. So I think we&#8217;re aiming for something in the middle. The Ink &amp; Switch systems all run on this system called AutoMerge, which is a library for synchronizing JSON documents. That was like, JSON is the universal shape. There are different options. I think a Postgres database is perfectly reasonable other option. But I think that&#8217;s roughly the challenge.</p><p><strong>Kanjun Qiu (36:04)</strong></p><p>Mm-hmm. That&#8217;s really interesting. I have a bunch of thoughts here. One, it depends on the structure of your data a little bit. A Postgres database is good for data that is stored with identifiers and attributes of those identifiers. And JSON blobs are good for a slightly different type of data. What you said makes me wonder, though, if everything we&#8217;re playing text and people could actually diverge quite a lot. We now have these universal data processors, which are LLMs.</p><p>And so can we turn that underlying file into really almost any abstraction along this spectrum of like, you know, from like sequence of bytes to like JSON key values to like JSON with a schema to database to something else.</p><p><strong>Geoffrey Litt (36:56)</strong></p><p>I&#8217;m super optimistic about that direction of thinking. Many of the data interrupt problems in the world are just like the same information being represented slightly differently. And for those, LLMS, slam dunk. That said, you know, there are also essential differences. Like, in Cambria, the example we gave is like, if one to-do list app can assign multiple people to a task, but another one can only assign one person, there&#8217;s nowhere to show it. It doesn&#8217;t work. And maybe then you don&#8217;t realize that I&#8217;m also working on your task. So I think there are these tricky, essential things to keep in mind when we&#8217;re working on shared information together with divergent tooling.</p><p><strong>Kanjun Qiu (37:41)</strong></p><p>One thing that&#8217;s interesting about that example is it really separates the concerns at the data layer versus the UI layer. In theory, you could just store whatever data you want. You can store this task having many, many people assigned to it. And then at the UI layer, or at the app layer, you do something, some kind of post-processing to figure out who you want it to be for your app.</p><p><strong>Geoffrey Litt (38:12)</strong></p><p>Yeah, but if the app can&#8217;t show multiple people, you still don&#8217;t, you know&#8230;</p><p><strong>Kanjun Qiu (38:17)</strong></p><p>You still can&#8217;t see the underlying data. </p><p><strong>Geoffrey Litt (38:20)</strong></p><p>Yeah, and I think maybe you show more of the underlying data or maybe, I don&#8217;t know, like an AI comes in and modifies the other to do app and makes it show multiple people because you just need that. And if the other person doesn&#8217;t mind you, roll with that. I think it&#8217;s the essence of it is like when we&#8217;re collaborating together, we actually have to make compromises about how we&#8217;re going to do stuff. And there&#8217;s always going to be like a collective element there that like software, software can&#8217;t let us be infinitely individualistic, you know, and I think this actually gets to a broader malleability theme. We talk about this in the <a href="https://www.inkandswitch.com/malleable-software/">essay</a> we published this summer at Ink &amp; Switch around malleable software. The goal is not that everyone develops the full skillset needed to do anything to their software. There&#8217;s a long history of people working together with others.</p><p>With spreadsheets, for example, there&#8217;s been some really nice ethnographic research by Bonnie Nardi, who&#8217;s kind of a legend in the end user programming research community, looking at, how do people use spreadsheets in offices? And it turns out usually there&#8217;s someone in the office who&#8217;s really good at Excel. And when you don&#8217;t know how to do a complicated formula, you go ask them, right? But you can still do a lot of stuff yourself. And maybe you pick up a bit on what that person did and watch them work and you level up gradually.</p><p>And crucially, that person doesn&#8217;t work at Microsoft. They are in your context. They can sit with you. They know your problems. And so they&#8217;re much, much closer to the site of use than the site of original platform production. They call this pattern local developers. I think this is a really, really, really important pattern to think about and build around for these kinds of systems. I mean, we see it at Notion. There&#8217;s often someone in a company who is really good at Notion and sets stuff up for people, right?</p><p>That layer always exists. And it&#8217;s not a bad thing that it exists. AI might be able to help fill that role sometimes for some people, but I think assuming that people are working together to create shared software environments should be the goal.</p><p><strong>Kanjun Qiu (40:26)</strong></p><p>Yeah, that&#8217;s really interesting. In software development, there&#8217;s the same thing. Our CTO, my co-founder, Josh, is the expert in a bunch of different ways and helps people figure out how to build on top of the system that we have. Regardless of how good LLMs are, you, you probably want some kind of like expert. There&#8217;s still, maybe what you&#8217;re saying is like, there&#8217;s still this idea of levels of expertise with a tool, even if it&#8217;s an end user program tool like Excel or programming or like Notion. There&#8217;s levels of expertise and someone who has a lot of that expertise can actually do a lot of the quote unquote programming and set up the system for other people in their context to modify. it&#8217;s not just like a, to our data point, it&#8217;s not just like blob of data and then everyone does their own entire full stack thing on top of this blob of data. It&#8217;s like, okay, blob of data and then like, different people take it and mold it into what&#8217;s useful for their own context.</p><p><strong>Geoffrey Litt (41:29)</strong></p><p>The mental model that I really like there is this idea of a smooth slope from user to creator. It&#8217;s not that deep modifications aren&#8217;t hard. It&#8217;s that whatever you want to do, you should have to do the least amount of work possible to do that thing. And you can get slowly pulled into deeper stuff if and only if you want to. And you stop where you want to. I think this is very distinct from like our existing computing ecosystem. You can basically like use the thing, tweak some settings.</p><p>And then, if it&#8217;s an open source project, guess you could download the entire code base, compile it for five hours, learn to code, and it&#8217;s this insurmountable cliff. No one&#8217;s doing that, right? It&#8217;s just some approximation. And so how do we smooth out that cliff is one way to think about it. And I&#8217;m curious for your thoughts on, I think AI can help pull people up that cliff if the system is designed correctly. I also think it might actively prevent people from going up the cliff if it&#8217;s arranged a certain way. And what I mean by that is, if I ask my coworker, who&#8217;s the Excel wizard to teach me formulas, and they sit with me for an hour and we do it together and I see them doing stuff and we talk about it, maybe that&#8217;s a learning moment for me. Whereas if I ask the whatever Excel formula wizard to do it and it spits out something in five seconds and that&#8217;s wrong, but I don&#8217;t notice or, even maybe even if it&#8217;s right, you know, if it does the thing for me, what did I learn?</p><p>I actually lost learning them, you know? I think a lot about how can we set things up to be closer to that former, but I&#8217;m curious how you think about that.</p><p><strong>Kanjun Qiu (43:05)</strong></p><p>I think this is really one of the key insights about end user programming, is that there is a skill curve. There&#8217;s kind of this learning curve. And you had the gentle slope to tailorability, is what you called it. And in LLMs, something that we think about is there are kind of two pieces to tailorability. One is how how much the user understands. And then the other is how tailorable the system is. And you can modify both. Modifying how much the user understands is about education in a lot of ways. It&#8217;s about how do we make it so that it&#8217;s easy for the user to understand how to go closer to what they&#8217;re trying to do. </p><p>A concrete example of us experimenting with this in Sculptor is that there&#8217;s a beta feature called Suggestions. And it&#8217;s still very early, but it basically looks at your code base and suggests fixes, improvements, and refactors, directions you can go based on what it looks like you&#8217;re trying to do. And in theory, the suggestions, they&#8217;re proactive. And so they&#8217;re kind of telling you things about your code base that you might not know about. And they&#8217;re telling you things that you might end up learning. So we&#8217;ve had some users who are like, I didn&#8217;t realize that I shouldn&#8217;t expose my API key in plain text. Cool. Didn&#8217;t know that was a security best practice. Or like, I didn&#8217;t realize that I had like five copies of this function that were slightly different from each other. And actually, there was this better way of doing things that&#8217;s like the default standard. So that kind of proactive teaching, I think, could be part of a system that is an environment, is like an end user programming environment. </p><p>The ambitious way I think about Sculptor is like, if we could make this into an end user programming environment, that would be awesome. On the system side, how do you make the system, outside of user education, how do you make the system actually more tailorable? I&#8217;m curious for your thoughts here, but I was thinking about interfaces and how some interfaces feel like they might be more amenable to tailorability than others. </p><p>For example, this might be a terrible example, and might not actually satisfy this requirement. I&#8217;m going to try it anyway. And I&#8217;m curious what you think about more tailorable interfaces. But for example, the other day I was using MailChimp, and I was trying to send a plain text email. And I could not figure out how to send a plain text email in MailChimp. This is extremely difficult. And I was like, man, it would be really nice if I had a retrieval UI where I could send some messages in chat and it finds the API endpoint that is the plaintext email function and then gives me a UI that is the plaintext email. That would be really nice. Then I could learn, OK, do you have a plaintext email endpoint first. And second, if you don&#8217;t, then maybe that would be an entry point for me to build one, something like that. So if an app is no navigation, none of these other dependencies, it&#8217;s like retrieval only, like only retrieve API endpoints that take actions. Like maybe that lets me like build more actions on top of the systems, the system. I don&#8217;t know. What do you think?</p><p><strong>Geoffrey Litt (46:37)</strong></p><p>I think you&#8217;re getting at a really big question, which is how are UIs going to evolve in this new age? I think we might have talked a bit on Twitter about this too, like navigation-free apps or whatever. Yeah. So let me get at your question indirectly. So I promise I&#8217;ll come back to it. So I think command lines are really interesting. We&#8217;ve left them behind for good reasons. GUIs are better in a lot of ways. But there was a really interesting quality that command lines had, which was that when you do stuff manually one time, it&#8217;s the same way you do it if you want to automate it or build on top of it. While you&#8217;re in the course of normal use, you&#8217;re kind of picking up this underlying structure that ends up being really useful if you ever want to build on top of the thing. Someone had this great phrase, like a CLI is like a mediocre GUI and a mediocre API both. And that&#8217;s what makes it great, which I think is really lovely. What you&#8217;re talking about, I think a big problem with GUIs is that they lack a lot of hooks and compositionality for building on top of and going further with. They kind of tend to actually not really expose you to what are the underlying things I can actually do in the system and how could I recompose those in different ways? And so I think that&#8217;s a big question and challenge for me is, can we retain the benefits of graphical interfaces, things like discoverability, things like data visualization, which I think is really underused in a lot of LLM interfaces showing this stuff. But can we also figure out how to make it obvious that you can go further than what this one GUI lets you do and let you in on the internal structure? One concrete starting point could be, in a lot of power user apps like Photoshop, when you do stuff, there&#8217;s an undo stack that shows you everything you&#8217;ve done in a list. So it&#8217;s reifying actions you&#8217;re taking as steps. And then that&#8217;s the building off point for macro recordings and kind of like automations. And I think I wonder, could we have more computing environments where as you do stuff, you see the things you did as things? And then you go from there.</p><p><strong>Kanjun Qiu (49:02)</strong></p><p>Mm-hmm. That&#8217;s really interesting. Building on that a little bit, something that we think a lot, we work on AI agents, agents take actions, and I think there&#8217;s a difference between displaying information and taking actions. And the description, actually what you described about a CLI as a mediocre GUI and a mediocre API is really interesting because CLI tools are primarily for taking actions. They&#8217;re not very good for displaying information.</p><p>GUIs are really good for displaying information and they can be good for discovering actions, maybe like taking action sometimes like if you&#8217;re trying to figure out what action to take then maybe you can kind of like play around with the information until you figure out what action to take. But the taking of the action in the GUI is not great. It&#8217;s not very composable. It&#8217;s like, you know, not very like automatable.</p><p>And so if we think about displaying information and taking actions as two separate things, then it makes me wonder, OK, your point about the undo stack is interesting because that&#8217;s a sequence of actions which could be turned into a CLI tool, in theory. And the question really is, OK, what&#8217;s the input into the CLI tool? Unfortunately, sometimes the input involves looking at a bunch of data and analyzing it and visualizing it in a GUI form or something like that.</p><p>But there&#8217;s some processing that goes into the input, but the action itself can be a CLI tool.</p><p><strong>Geoffrey Litt (50:27)</strong></p><p>Yeah, totally. I think, you know, now I wrote most I write most of my CLI commands by just telling an AI what I want to do. And then it writes some really long command that I don&#8217;t fully understand. I hit enter, right. Which I should pay more attention to. But I think you&#8217;re really getting us something. It usually works, right? </p><p><strong>Kanjun Qiu (50:46)</strong></p><p>Yeah, exactly. </p><p><strong>Geoffrey Litt (50:55)</strong></p><p>I think like we are all figuring out one interaction models make sense right now. And I think you&#8217;re getting at a couple of important things, which is that for commands and actions, think language is actually really good for saying what to do for the most part. And then for the return path from the agent, I think for some things, maybe two-way voice conversation feels good, but for a lot of things, having visual aids helps. So deploying the full field of graphic design and data vis to show things.</p><p>When you ask Siri for the weather and it shows you a weather card, this is a version of this loop. So think that&#8217;s a really powerful basic loop. And then the one thing I want beyond that for some use cases is a shared locus of attention, like a desk we can both point at and work on. So that might be as simple as telling the agent, edit this code in Sculptor. Conversely the agent saying like, did you notice that this line is weird? You you&#8217;re kind of sharing this this thing.</p><p><strong>Kanjun Qiu (51:59)</strong></p><p>Yeah, you have like a shared space you&#8217;re both looking at.</p><p><strong>Geoffrey Litt (52:02)</strong></p><p>And you can point at it. That combination, I think, ends up being pretty good.</p><p><strong>Kanjun Qiu (52:07)</strong></p><p>Hmm, interesting. An idea we&#8217;ve been toying with is that agent-human collaboration and human-human collaboration might not be such different things. Perhaps you can design for both of them. One of the principles in Sculptor, one of the design principles, is everything you can see, the agent should also be able to see. It should understand how it works. It should see your entire UI. If you tell it something and you&#8217;re referencing a part of the UI that&#8217;s not the chat, it should know what you&#8217;re talking about.</p><p>And same with like human-human collaboration. To your point earlier about like real time is just what people expect. Like I think maybe like two humans want to both be looking at the same surface and the same information. Otherwise it&#8217;s actually quite hard to communicate.</p><p><strong>Geoffrey Litt (52:52)</strong></p><p>I really like that principle you just brought up. I think to a large extent aiming for human-human collaboration as a gold standard for a lot of stuff is actually a great goal. I think there are other patterns that can make sense sometimes, but even just looking at human-human, like if you and I are sitting next to each other pair programming, there&#8217;s a lot going on. You know, very simple stuff, like you can point at things and see my screen and I know that you can see my screen and there&#8217;s not any weird question of what can you see. So there&#8217;s a lot of good theory of mind going on.</p><p>But also I think there&#8217;s much deeper stuff. Something I&#8217;ve been thinking about lately and I&#8217;m curious for your thoughts on is you can tell if I&#8217;m really busy and stressed because we have a launch tomorrow and I just want this fricking button to work. And I&#8217;m like, Hey, Kanjun, can you fix this button for me, please? You&#8217;re not going to launch until like an hour long lecture about like the philosophy of like how we think about buttons, right? You&#8217;re just going to like help me out because I&#8217;m in a bind. And you know, it might be a totally different situation. Like it might be my first day at a new job. And I&#8217;m like, man, like I&#8217;ve never used this programming language before. Can you like show me around a bit? And I&#8217;ve always felt like computers by defaulting to having so little context about us and our environments compared to human interactions are at a real disadvantage where they can&#8217;t sense these things. And so they rely on us to give them that context through our prompting in the AI era, but we&#8217;re not very good at giving them all the context that they need. And so we end up in these weird mismatches. Particularly along that dimension, I just brought up, how do you know how much to bring the person along and help them learn themselves versus just do it for them? When should they be brought along? When does it matter? </p><p>I don&#8217;t even know myself and as a programmer, I&#8217;m often very unsure how much I should be getting in the details of the thing. Even if the AI can do it perfectly, there&#8217;s some intangible benefit to me being in the details. When I&#8217;m UI prototyping, for example, I might have new different ideas from knowing how it works, for example. And so I don&#8217;t even trust myself to know how much I should be in the details.</p><p>How do we do this?</p><p><strong>Kanjun Qiu (55:20)</strong></p><p>That&#8217;s a really interesting question and direction. When people talk about AI slop, AI slop is this lack of taste in a way. What you&#8217;re pointing at that&#8217;s really interesting is the more you understand how something works, how your system works, how what you&#8217;re trying to build, the UI you&#8217;re trying to build works, the more taste you have for where it can go. there&#8217;s this taste comes from depth and a depth of understanding of me as the human that, and like, it&#8217;s so weird because like AI systems have no taste and yet they know everything. So it&#8217;s not about knowing the thing. It&#8217;s something about like preferential attention based on the details that we&#8217;re seeing that like serve what we&#8217;re trying to get at or something. I&#8217;m quite confused about this topic.</p><p><strong>Geoffrey Litt (56:13)</strong></p><p>I think, yeah, I think a lot of people have a very incorrect mental model of how creative work happens, which is something like there&#8217;s an idea in your head, you just got to somehow like get it into the world as it is in your head. And if you could just do that, then it&#8217;s done. And so in that model, like all you need to do quote unquote is like describe the idea perfectly. And then someone else, something else can just go to it. Right. That&#8217;s not how it works.</p><p><strong>Kanjun Qiu (56:38)</strong></p><p>That&#8217;s not how it works. Not at all.</p><p><strong>Geoffrey Litt (56:43)</strong></p><p>Creative work, open-ended work, like anyone who&#8217;s really deep in it knows that there&#8217;s this conversation happening between you and some medium that you work in where the idea is being shaped as you go. Working with the medium is changing your conception of what you want. There might even be like accidents that happen that are cool, know, spark new ideas. And, you know, to some extent, maybe some of it&#8217;s even muscle memory, right? So it&#8217;s possible, for example, that like a guitarist composing a new song might not know what chords they&#8217;re about to play. Their fingers just do something and then they hear it and they&#8217;re like, that&#8217;s cool. Right. So when you, when you start digging into that, I think it, it raises a lot of questions about the role of AI in that process. And I think a lot about this in my own creative practice, which is mostly professionally UI prototyping. Like, I use AI coding a lot. And I think, at its best, can really speed up feedback loops that weren&#8217;t essential for me to be in. And that lets me make progress faster in this exploration. At worst, it cuts off a whole process that I would have been in myself of creative exploration because it just, I say what I want and it makes one bad thing. And I&#8217;m like, oh man, that&#8217;s not good, but like you made the whole thing and now I just, can&#8217;t unsee it, you know?</p><p><strong>Kanjun Qiu (58:12)</strong></p><p>Do you really feel that? That&#8217;s really interesting.</p><p><strong>Geoffrey Litt (58:18)</strong></p><p>Yeah, totally. That&#8217;s happened to me. I&#8217;m like, oh, well, it&#8217;s done and it&#8217;s terrible. Like, whatever. And I think it&#8217;s a very sensitive. I&#8217;m an AI coding optimist in the sense that I think it can be a huge accelerant. I use it a lot, but I think we have to be very clear about like, we&#8217;re all changing our creative media and that&#8217;s going to do something to our creative practice. And I think the people who are worried about, especially AI art are totally on to something there.</p><p><strong>Kanjun Qiu (58:41)</strong></p><p>Mm-hmm. This is really interesting. The thing that I&#8217;m afraid to say when I talk about Imbue or AI agents is I think of us as trying to upend the economic system in a way and something about the way that things are working. Because fundamentally, AI systems and AI agents are a source of power. And they become that source of power by basically being the way thinking happens. And so what you just said is like, the system thought for me and now I have this thought, but like I didn&#8217;t have the intermediary thoughts before I got to this thought. And because I didn&#8217;t have those intermediary thoughts, I couldn&#8217;t get to a different thought. I only got to the end thought. So like, it&#8217;s quite concerning. </p><p><strong>Geoffrey Litt (59:30)</strong></p><p>It&#8217;s very concerning to me.</p><p><strong>Kanjun Qiu (59:39)</strong></p><p>I&#8217;m curious after you got the end thought, if you went back to the intermediary thoughts, like, okay, you got this bad UI from your LLM. Can you go back to the intermediary things and like try and understand it better, or can you truly not unsee it? Is there some property? Like I&#8217;m curious about your personal experience here. </p><p><strong>Geoffrey Litt (59:55)</strong></p><p>Yeah. I would say that for me, it&#8217;s a very emotional process. So it&#8217;s not just like a logical thing. There&#8217;s like an excitement factor, a momentum factor, you know, like, oh yeah, like we&#8217;re getting somewhere factor. And again, the tricky thing is that AI often really helps with this. Like it preserves momentum and avoids roadblocks that would have killed the vibe, you know? So like it&#8217;s great when it works. </p><p>But when it doesn&#8217;t, yeah, it&#8217;s not really about literally being able to unsee it. It&#8217;s that it changes my emotional relationship to the process in a way that makes me not as excited about doing it anymore.</p><p><strong>Kanjun Qiu (1:00:42)</strong></p><p>Yeah, okay, so maybe what happened is like it came up, it came out with a bad idea and it killed your momentum. And you&#8217;re like, I thought that there was something interesting here, but I guess not. Maybe I&#8217;ll move on to something else.</p><p><strong>Geoffrey Litt (1:00:52)</strong></p><p>Exactly. Yeah. And I&#8217;ll never know what if I had done it myself, would I have come up with something. And in fact, even when it comes to, in some ways, when it comes up with something good, it&#8217;s even worse because sometimes I&#8217;m like, this is pretty good. Like I have a few little tweaks, but good job. And then I&#8217;m like, wait, what would I have done, like would I have done better? I don&#8217;t know. And I&#8217;m not going to spend the time to figure out anymore. So, you know, I think one, one like mental model, I try to use is there are kind of things that I care more and less about that I see as more and less core to like who I am or what I work on. And so the less core it is to me, like some disposable secondary tool or something that I wouldn&#8217;t have built without AI in the first place, I&#8217;m okay just like being very free with it. But then the closer it gets to like my core practice, I feel like more of a urgency to be really critically reflecting on what&#8217;s going on and kind of not going too far, yeah.</p><p><strong>Kanjun Qiu (1:01:57)</strong></p><p>Yeah, it feels sometimes to me, I resonate with this a lot. I tried using like GPT 4.5 for writing and 4.5 was like the first model that was actually quite good at writing. And for like a month I was like really happy. I was like, oh my God, my writing process is amazing. Like I&#8217;m getting so many more ideas through like all in flow. There&#8217;s no writer&#8217;s block. And then I like zoomed out for a week, stopped working on the piece I was working on. I came back and I was like, wow, this is like not me at all. And I rewrote the whole thing, no LLMs. And the really interesting reflection, I feel like there are almost kind of like thought Schelling points. And because these systems are distributions, they actually produce the thought Schelling points that are highest likelihood. And because they&#8217;re high likelihood and they&#8217;re Schelling points, you kind of like just end up there and it&#8217;s like really hard to get out of them. They&#8217;re really tempting. Yeah, they pull you in exactly. And so now you kind of like end up in this like weird groove and it&#8217;s really actually hard to get out of it and be creative like takes stepping away.</p><p><strong>Geoffrey Litt (1:03:14)</strong></p><p>Yeah. I mean, another manifestation of this that I&#8217;m curious about with Sculptor is, I&#8217;ve been playing with parallel coding agents a bit and finding them really interesting. I&#8217;m still learning how to use them. Something I found recently, I went overboard for a day. So was like, oh my God, this is amazing. I had two projects that I was working on and I had to pick one to work on for the day. And I was like, you know what? I can do both. And so all day I was just kind of flip-flopping between these two projects.</p><p>You know, it kind of worked, but I felt really off at the end of the day. And what I realized was, man, like, I&#8217;m not sure that I did great work on either, because actually, even with perfect implementers doing stuff on both projects, it&#8217;s not like an AI coding quality challenge. It&#8217;s like a my mental bandwidth challenge. Like, if I&#8217;m really creatively leading these things, I can&#8217;t multitask, actually. And so there&#8217;s a different bottleneck, which is me and my brain. And I think like I&#8217;ve been trying to reflect on, so what do I do with that? You know, maybe I only parallelize within the same project or on the same area. Maybe I have one main thing I&#8217;m thinking about and then I have armies of bots doing all the maintenance and bug fixing and stuff I don&#8217;t have time for. I don&#8217;t know, but I&#8217;m curious your thoughts.</p><p><strong>Kanjun Qiu (1:04:35)</strong></p><p>That&#8217;s super interesting because yeah, I always recommend never work on two projects at the same time if you&#8217;re trying to do something creative with these agents, with these parallel agents. It&#8217;s interesting because I resonate with you, what you&#8217;re saying a lot. I think when you&#8217;re doing something creative or researchy with software, maybe at least for me, like what I&#8217;m trying to do is explore the space and evolve my thinking and understanding of the problem as I&#8217;m building. That&#8217;s like a very abstract, you know, yeah. But I&#8217;m evolving my understanding of the problem, of what I&#8217;m trying to do. And so with parallel agents, like, I don&#8217;t know, in Sculptor, one thing we&#8217;re trying to optimize for is divergence instead of convergence. So recently we shipped a feature. This is in beta right now. You can turn it on in the settings called Forking. You can Fork an agent. And so now you can like, you know, you like had this agent build a UI. You didn&#8217;t like the UI. Go back to where you started and be like try something totally different, like try this instead and then try also this. And it&#8217;ll like snapshot your agents current state all the context and like fork it into a bunch of different tasks And one thing I really like about this&#8230;</p><p>Geoffrey Litt (1:05:50)</p><p>Yeah, no, I&#8217;m excited. Keep going.</p><p>Kanjun Qiu (1:06:01)</p><p>One thing I really like about this is it kind of gets me out of this groove we were just talking about. There are many ways to end up in that groove. One way to end up in that groove is I&#8217;ve built up some context. I went down this path. I&#8217;m like maybe debugging some minor detail. Now I&#8217;m really annoyed because like all of this debugging context is in the context. I need to like get it back somehow. And like, I&#8217;m like in this weird groove. Another way to get into a weird groove is, it had a bad idea and now I can&#8217;t get it out. Like I can&#8217;t get back to the place where it could have generated good ideas. And so yeah, the forking thing is really about like, how do I help the user get out of grooves so that they can do really divergent thinking and divergent things and not have to try to wrestle with the agent to get the agent out of these grooves?</p><p><strong>Geoffrey Litt (1:06:44)</strong></p><p>I love that idea. I&#8217;m a huge fan of that way of thinking. And know, it&#8217;s funny, it&#8217;s coming back to version control actually. Like these questions of like, how do you structure divergence to work and like even see it and how do you encourage it is really, it&#8217;s like a tooling problem, I think. Yeah, I mean, something I&#8217;ve wondered about is also having more structure to the divergence. So what I mean by that is not just like try three random things, but let&#8217;s say I&#8217;m like, I want a to-do app. What if an agent said, okay, so one thing you can have the agent do is give you a big questionnaire, right? Like, should it be really simple or really complicated? Like, should it be for work or for personal life? And you just go through and answer these 10 questions or whatever. I think that&#8217;s what a lot of, that&#8217;s like the current state of the art, I would say, specification is answering a bunch of questions.</p><p>And it&#8217;s fine, but it&#8217;s pretty tedious. And it&#8217;s also, I think, not how real design processes often work best. Often the way things work best is by looking at a few and saying, like, I like that one, and then talking about why. So something I&#8217;ve thought about is, once the tokens are free, can you just generate like 100 to-do apps? But not randomly. So the agent would first think about, OK, what are the dimensions that the user might care about? Let&#8217;s set up a design space along those, you know, three, five, eight dimensions, whatever. Let&#8217;s take some guesses on what they might want and maybe pick some, bunch of app points in that space around there. And then it&#8217;s also try some wild cards, you know, really crazy options, pre-generate a hundred apps. And then when we come back to the user, we&#8217;re like, okay, let&#8217;s just start a conversation. Let&#8217;s show you some options. You want it to be more X, we had that ready already, you know? It sort of would be more like, jamming with like a design consultancy except the feedback loop is like in seconds and not weeks, you know? But like more playing with options.</p><p><strong>Kanjun Qiu (1:08:50)</strong></p><p>Yeah, I think this is really interesting. There are LLM tools out there. When you ask deep research to go do some research, the first thing it&#8217;ll do is ask you some questions about the query that you asked. And whenever it asks me these questions, my answer to all of the questions is like, yes to all. The questions are useless. And I was reflecting on why are these questions useless? It&#8217;s because it&#8217;s not actually questions I want to answer. It&#8217;s more, I want to see some output and be like, I didn&#8217;t like this part. I want more of this. What we want is for the LLM to like help us like understand the problem better. This goes back to what you and I were just saying about like the creative process is about understanding the problem space, what we&#8217;re trying to solve for or trying to do better as we go and like to be able to create and like move in that direction easily through the medium.</p><p><strong>Geoffrey Litt (1:09:44)</strong></p><p>Yeah, I love that you&#8217;re thinking about encouraging this in the tool because I totally agree with you that often, even if it&#8217;s technically possible to do this somehow, once you&#8217;re in the groove, kind of, you feel stuck unless it&#8217;s easy. There&#8217;s a couple, there&#8217;s a couple of beautiful systems out there that play with ideas of spatial canvases as ways to visualize that branching. So like in, not just the LLM chats, but also like creative media, there&#8217;s a system I love called Spellburst. My friend Tyler Angert and some folks at Stanford worked on it. It&#8217;s just a spatial canvas and you make these little like art sketches and then you can hit a button that makes a bunch of forks off from that one and you basically try a bunch of things and then you&#8217;re like, I like that one and then let&#8217;s diverge from there, right? And so you can kind of like explore but you see all the variations spreading out in this tree and I think that sort of thinking can be very generative.</p><p><strong>Kanjun Qiu (1:10:39)</strong></p><p>That&#8217;s super interesting. Yeah, we&#8217;ve struggled to figure out how to represent forking, like forked agents. And I don&#8217;t know if a canvas works when it&#8217;s not so visual. You can&#8217;t really visualize code very well. I want to see at a glance where each fork is going, but it&#8217;s really hard to do that with code. I&#8217;m curious when you thought about malleable software and helping the user not only make changes, but this goes back to the question of what kind of infrastructure, technical infrastructure allows the user to actually explore? I&#8217;m curious if you have any thoughts on this.</p><p><strong>Geoffrey Litt (1:11:11)</strong></p><p>The fascinating question. I agree with you that inherently visual media are a much more obvious fit for like a visual canvas or something. What you&#8217;re getting at really is how do you give the right feel for what a piece of code is in a concise visual way? I think one dimension that I&#8217;m always thinking about is it&#8217;s really hard to tell from looking at code how solid it is.</p><p>This is a big problem in malleable software we found in Patchwork because when a company ships a piece of software in the app store, there&#8217;s like a minimum bar it&#8217;s hitting, right? You hope. That might not be true anymore with LLMs honestly, but there&#8217;s someone&#8217;s charging money for it. It should be at a certain quality bar. If I just made a tool for myself and I vibe-coded it in five minutes and it works for me, what do you make of that? Do you want to use it? Like, it depends. Probably you wouldn&#8217;t want to just wholesale adopt if it&#8217;s really important. You might be okay playing with it. But it&#8217;s hard to tell often from the outside which one it is. And is this thing even maintained? You know, I think people look at like GitHub stars and commit histories, for example, as like a these sorts of signals of life. right. I think if we have way more software and it can be produced way more readily by people who don&#8217;t know what they&#8217;re doing, you know, there&#8217;s gonna be more bad software out there, which is not necessarily a problem. There&#8217;s a lot of bad spreadsheets out there and it&#8217;s fine, but I think you need to be able to tell. I kind of wish you could see like, you know, an analogy I like is, you know, is this like a balsa wood model of a bridge? Or is this like the Golden Gate frickin&#8217; bridge? Like in physical media, it&#8217;s really, really obvious and you can never get confused, but with software, it&#8217;s not as clear and could we make that clearer somehow?</p><p><strong>Kanjun Qiu (1:13:03)</strong></p><p>This is really interesting. One of the prototypes we&#8217;ve been playing around with in Sculptor is this idea of a report card for your code. So can we give you, yeah, like, I think it&#8217;s actually a really hard, I mean, having worked on it, it&#8217;s a really hard problem to answer. What goes into the report card? How do know if a piece of code is robust? Depending on what you want to do with it, like, you might want it to be robust in different ways. Maybe you want it to be more extensible, or maybe you want it to be, like, really well tested. so, but, you know, to your question of, if I&#8217;m in this world of malleable software, there&#8217;s a lot of forking, there&#8217;s a lot of divergence. How do know where to build from? What is safe to build from? What are the proxies for that? I think that&#8217;s a really good question.</p><p><strong>Geoffrey Litt (1:13:45)</strong></p><p>I&#8217;ll throw another idea, which is I wonder, one of my beliefs around divergence and versioning is that there&#8217;s a lot of meta work around the work that humans find tedious, like writing pull request descriptions. And getting a pull request that has a really, really good description is much easier to review, but it&#8217;s a lot of work to produce that.</p><p>I think AIs, we should be pushing much harder than we are to produce amazing review experiences and artifacts. It&#8217;s the kind of thing, you know, we should be the most spoiled managers in the world where like our reports are coming to us with like these, they&#8217;ve spent like weeks preparing this presentation about this tiny bug fix. That doesn&#8217;t really matter, you know, is the feeling we can have, like we can spend the virtual time for them to do that.</p><p>And whether that&#8217;s, you know, coming up with like maybe, yeah, if it if it makes 100 apps for me, maybe it should make like a 3D world where I can like browse the different apps and see how they&#8217;re different from one another. Maybe it&#8217;s just like a really, really good deck that like walks me through them and explains the differences as like a PDF. I don&#8217;t know. Like if we gave like a high school or college intern the task of explaining these 100 apps, what would they do? I don&#8217;t know. But I think they can get extremely creative.</p><p>There&#8217;s one project I love called <a href="https://www.ericrawn.media/quickpose-docs">Quickpose</a>, I think it&#8217;s called that by <a href="https://www.ericrawn.media">Eric Rawn</a>, where like they did versioning on a spatial canvas, but it was more like a whiteboard that you got to draw on and arrange the versions yourself free form. So they found the artists would use it. I think they did some tests with artists and you it would be like a cluster of versions over in this corner is like the ones that had this intriguing property. And then there was this offshoot over there was really weird. And then this is like our mainline exploration and you can like actually arrange them and label them and describe them yourself. So imagine like if the AIs is diverging, could it like make a poster for you of what it tried and how it all fits together?</p><p><strong>Kanjun Qiu (1:15:48)</strong></p><p>That&#8217;s super interesting. Two really interesting things in that, one is the malleability of this canvas where artists could arrange the versions themselves is in itself a malleable software property where you want the end user to be able to explore by arranging the explorations and reasoning and thinking through them. So that&#8217;s really interesting. The other thing that you said that I thought was really interesting is this idea that we&#8217;re really under focused on presenting results from the AI. Like the LLM just dumps a bunch of text at you. Why has it not thought about how to present it as a slideshow or as like a, you know, as a better presentation. That&#8217;s really weird.</p><p><strong>Geoffrey Litt (1:16:33)</strong></p><p>It&#8217;s wild. I also think, you know, so I think this is going to become much, much more important very rapidly because so there&#8217;s one argument you could make, which is like, you know, the answer is to get so good, we don&#8217;t need to review. But I think that&#8217;s totally false. Because what I&#8217;ve observed in coding is they get better. So I give them harder stuff. And in fact, it&#8217;s almost the opposite problem where I&#8217;m giving them more and more stuff that&#8217;s more and more important that review becomes and they&#8217;re going off and doing stuff that I&#8217;m not even in the details on anymore. So the review step becomes more more critical over time. And I think it&#8217;s headed towards the world where most of my time is reviewing. So really, the quality of that review experience, whatever it is, to let me quickly, happily, and correctly tell, is this good? And what do I want to change? I think that&#8217;s the whole ballgame for interfaces for using these things.</p><p><strong>Kanjun Qiu (1:17:29)</strong></p><p>I think that&#8217;s really interesting, and I really agree. I would actually, a way I&#8217;ve been thinking about it in my mind is like switching from the term review, which is part of this like industrial process of software, as you said earlier, I love that term, switching from the idea of this as review to the idea of this as like being part of the medium of working with coding agents or AI agents. Like review is part of a powerful medium for agents because the medium doesn&#8217;t really work without this step of understanding what is going on and where to steer it next. And as you said, like, yeah, the more we use it, the more critical things we give it. And the more important this piece of the medium is, it feels like.</p><p><strong>Geoffrey Litt (1:18:14)</strong></p><p>Yeah, I guess I love that reframe. I think like you could think of it two ways, right? One is a human to human analogy of like we&#8217;re jamming and like it&#8217;s not like a quality assurance step. It&#8217;s more like we&#8217;re working together. You know, I wouldn&#8217;t like frame like if you brought me some work and we were working, I wouldn&#8217;t feel okay, like time to like check if you did what I said, it&#8217;d be like, you know, we&#8217;re riffing, right? So maybe that&#8217;s one way to think about it. But another is something about the ideas we&#8217;re talking about are visualizing, having thinking about non-human interactions. If a potter is forming a piece of clay into a pot, they&#8217;re not reviewing suggested pots. There&#8217;s just a loop going where the clay is becoming something, and they&#8217;re reacting live, and there&#8217;s a loop going. And I almost wonder if we could get to the point where crafting software feels that way, where there&#8217;s some representation you&#8217;re working with that feels like you can just directly touch it.</p><p>And it&#8217;s not a language interaction. You more see it coming together and you&#8217;re like, yeah, you&#8217;re plucking it into a pot. And I think that&#8217;s very, very obviously possible for shallow UI design. like, I should be able to move UI elements around. I shouldn&#8217;t be telling a model like, please move that box three pixels. You know, that&#8217;s ridiculous. And that hopefully will get solved. Although our civilization has taken backwards progress on that since the 90s. But the harder question is like, What is that for logic? I don&#8217;t know.</p><p><strong>Kanjun Qiu (1:19:52)</strong></p><p>Ugh, this is like the thing I want most. How do we turn software into clay? I don&#8217;t know. Yeah, what do you think about that for logic? It is kind of gets at <a href="https://worrydream.com">Bret Victor</a>&#8217;s <a href="https://dynamicland.org">Dynamicland</a>, like how do we tactically feel what&#8217;s happening in software?</p><p><strong>Geoffrey Litt (1:20:09)</strong></p><p>Yeah, since you mentioned Bret, Bret has this great essay though, <a href="https://worrydream.com/LadderOfAbstraction/">Up and Down the Ladder of Abstraction</a>, which I think has a lot to say about how do you see this map of a very complex space and navigate through it to find the place you want to be in that map? I think that&#8217;s a really beautiful idea that could be brought to what&#8217;s my map of my 100 to-do apps and how do I find the one that I want? Going up into very abstract land and then jumping down into like demoing concrete ones.</p><p>Another inspiration that I think about a lot is Michael Nielsen&#8217;s work on AI, I forget, I guess it&#8217;s like artificial intelligence augmentation, think is, yeah, where, you know, I think he and Sean Carter had this <a href="https://distill.pub/2017/aia/">piece</a> with these sliders which are changing very deep conceptual attributes of a font typeface that an AI, I think learned probably with an unsupervised learning algorithm or something. Basically, I think it was pretty simple. You are moving in latent space of fonts with a slider. And that does make me wonder, what is the equivalent of that for code?</p><p>Could you do some of the <a href="https://www.anthropic.com/research/persona-vectors">Anthropic steering vector</a> stuff on code gen and then you choose the steering vector? Is there a complicated slider that you can just drag and the app gets more complicated or simpler? I don&#8217;t know.</p><p><strong>Kanjun Qiu (1:21:46)</strong></p><p>It&#8217;s really interesting. It makes me think about, if you combine the slider idea and the abstraction ladder idea, you kind of drag the slider up and down the levels of abstraction. You can modify at each level of abstraction, and then the full abstraction ladder is regenerated for the new app. And now you can move down or up again so you&#8217;re like moving at whatever level you want. So like, okay, I want to actually like totally change this to do app in order to be this way versus like, I want to change this like tiny function in this to do app to like do this slightly different thing instead.</p><p><strong>Geoffrey Litt (1:22:24)</strong></p><p>I love that. I think this is a very ambitious vision we&#8217;re sketching out here. But yeah, I think this is a very different path than the one that the industry&#8217;s on right now, which is mostly just a bunch of natural language in the groove chats.</p><p><strong>Kanjun Qiu (1:22:29)</strong></p><p>Yes. I&#8217;m curious, you know, looking back at the research that you&#8217;ve done and where you are now, what insights do you feel like may be overlooked right now? Or like, what kinds of things do you think people are not paying attention to that they should pay more attention to to get the future that is more empowering, agentic, free?</p><p><strong>Geoffrey Litt (1:23:05)</strong></p><p>I would come back to the infrastructure piece of malleability. I think that people are, obviously everyone&#8217;s excited about AI making professional software development more productive. And I think some people are excited about personal tooling around AI building software for us. I think people haven&#8217;t realized how much the existing ecosystem is not prepared to support that. And when you think about the most basic questions like, if I wanted to add a feature to Airbnb, what would I do? People are like, wait, what? That doesn&#8217;t even... </p><p><strong>Kanjun Qiu (1:24:46)</strong></p><p>I can&#8217;t even compute that. </p><p><strong>Geoffrey Litt (1:24:47)</strong></p><p>I wrote this little tongue in cheek story essay thing where it was a conversation. was like an imaginary conversation with this mysterious wizard. And the mysterious wizard says, I want to schedule a weekly seminar, so it&#8217;s on my calendar and I just want to like figure out how many attendees there are and order the right number of pizzas for them automatically by Uber Eats. And this apprentice person is like I&#8217;ll just vibe code a new app that does it. And then the wizard is like, no, no, no. Like, can you add the button to Uber Eats, please? And the apprentice&#8217;s brain is like, pff, I don&#8217;t know, I can&#8217;t add that button. What are you talking about? So think there&#8217;s just like this really like deep change of perspective that hasn&#8217;t fully been internalized. When you bring the cost of editing code down by as much as it&#8217;s coming down, what makes sense to build around that is just not on people&#8217;s radar, I think.</p><p><strong>Kanjun Qiu (1:24:43)</strong></p><p>Yeah, what do you think? If you could list out every piece of infrastructure, this is awesome because it&#8217;s something that we&#8217;re thinking about. A way I think about Imbue is we want to build the public infrastructure that&#8217;s necessary for malleable personal software. So what is that? So if you had a wish list, what&#8217;s on the wish list?</p><p><strong>Geoffrey Litt (1:25:00)</strong></p><p>I think it&#8217;s a lot of what we talked about. mean, it&#8217;s like so many of the ideas, it&#8217;s all software ships with the editor for that software. It&#8217;s all software is live modifiable locally. The live modifications can be instantly live shared with your collaborators. But there&#8217;s awesome version control so you can diverge and converge as needed. You have really awesome data infrastructure that&#8217;s really easy for random individuals to run and doesn&#8217;t require corporate skill and enables modern collaborative apps.</p><p>A lot of those elements you could imagine coming together and essentially some sort of new operating system or platform. I think over time, I expect that the pressure towards personal software will be strong enough that we&#8217;ll start to see this emerge in some form. But I don&#8217;t know quite how.</p><p><strong>Kanjun Qiu (1:25:51)</strong></p><p>That&#8217;s really interesting. It&#8217;s really interesting because when we were building Sculptor, one of the things that we&#8217;re thinking about is, what if the software you&#8217;re building ships with some Sculptor environment so that the end user can edit it and see the live edits in real time and share the code base with someone? And something like that. It&#8217;s not quite there. I&#8217;m not quite sure how to do it because the version control problem is really hard. The data versioning problem is also really hard.</p><p><strong>Geoffrey Litt (1:26:23)</strong></p><p>Yeah, I mean, think who pays for the AI edits that are going to happen from the users? Like currently, AI code editing is economically viable because a lot of people doing it a lot are making software that ships to millions of people so they can get paid a lot to do it. How does that work? I mean, think there are like, there&#8217;s a lot of questions.</p><p><strong>Kanjun Qiu (1:26:46)</strong></p><p>For the infrastructure piece of malleability, is there anything in terms of cultural norms or the way that we think about software or the way that people exist or communities exist that you think either are changing or need to change as we go into this future?</p><p><strong>Geoffrey Litt (1:27:08)</strong></p><p>I&#8217;m really glad you asked that. This is actually one of the reasons I care most about malleable software. It has less to do with software and more to do with how people feel about their relationship to the world. There&#8217;s a Steve Jobs quote that I love that goes something like, the moment you realize that the people who made all this stuff were no smarter than you is when you can start actually changing things.</p><p><strong>Kanjun Qiu (1:27:29)</strong></p><p>Everything around you can be changed.</p><p><strong>Geoffrey Litt (1:27:31)</strong></p><p>Yeah, I think that&#8217;s a really, powerful mindset. And that is a mindset that&#8217;s cultivated in people in response to an environment and in iteration with an environment. think like disempowering environments create disempowered people who have learned to be helpless. And I think there&#8217;s like a general trend where, you know, narrowly, like you could look at examples of like cars have become a lot harder to understand.</p><p>You can&#8217;t really take apart an iPhone. There&#8217;s less comprehension possible in the world. And I also think software, because it&#8217;s not malleable typically, the more time we spend in digital environments, the more time we spend in these places that we think of as prefab corporate environments. The thoughts of what to change doesn&#8217;t even occur to us. I don&#8217;t think about how should we decorate this podcast meeting room that we&#8217;re in, because I can&#8217;t change it.</p><p><strong>Kanjun Qiu (1:28:30)</strong></p><p>We&#8217;re just consumers.</p><p><strong>Geoffrey Litt (1:28:32)</strong></p><p>We&#8217;re just consumers. I think the more time you spend in places that cultivate that mindset, the harder it gets to have agency in the world. So I think there&#8217;s a double risk here, which is that with AI, think some of the conversations we&#8217;ve had around being in the details and understanding things, if there&#8217;s less of a need to even understand things, to minimally get through your life, I think it ties into this general trend of there&#8217;s a revealed preference towards convenience and we all, myself included, choose it often, but that can have long-term consequences both for ourselves individually and as a society. And so I do think it&#8217;s one way we can work on this, I think, is make software a place where people can exercise their will more and they&#8217;re encouraged to do that. And I think that&#8217;s maybe one way to stem this tide a bit and may perhaps even start a virtuous spiral where kids come to feel that they can do anything and they will be right because they can.</p><p><strong>Kanjun Qiu (1:29:37)</strong></p><p>I love that. I resonate with that a lot. Our digital worlds right now are disempowering environments. We are at the mercy of them in a lot of ways. We can&#8217;t really change them very well. We don&#8217;t have that much agency. And because we spend so much of our lives digitally, we end up feeling disempowered in our lives. so it feels like as we go into a world with AI agents, this can get worse. These are agents run by other people with their incentives that are now taking actions on our behalf. And so it&#8217;s even more disempowering, some of the things that we talked about. The core cultural shift is how do we go, you know, how do we rekindle the original idea of the personal computer, the original dream of like these systems as like, manifestors of our will and what we want in our lives.</p><p><strong>Geoffrey Litt (1:30:27)</strong></p><p>Exactly. I think that&#8217;s a great closing note in a way to bring up that original vision of, you know, people like Douglas Engelbart and Alan Kay, their original vision was the personal computer is precisely this empowering thing. That&#8217;s why it&#8217;s personal. And so I think if we can find ways to get back to that, the world will be a better place.</p><p><strong>Kanjun Qiu (1:30:47)</strong></p><p>Cool. Well, I think it&#8217;s possible. I think we&#8217;re at this turning point right now where software can become personal and we do need these pieces of infrastructure that make it possible. like we could change the economic incentives around it because now everything is so replicable. And so we&#8217;re at a good time.</p><p><strong>Geoffrey Litt (1:31:09)</strong></p><p>I agree. Let&#8217;s do it. I just joined Notion, but Notion is one of the players trying to make it happen, right? Like, build that platform. I think there&#8217;s, it&#8217;s not going to be one winner. I think there&#8217;s going be many platforms that enable this philosophy of personal software in so many different arenas. So yeah, I&#8217;m excited. Let&#8217;s do it.</p><p><strong>Kanjun Qiu (1:31:23)</strong></p><p>I agree. Awesome. Well, thank you so much, Geoffrey. This was really fun and a great meandering dive into all of these different ideas. So I really appreciate it.</p><p><strong>Geoffrey Litt (1:31:32)</strong></p><p>Thank you.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://ideas.imbue.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://ideas.imbue.com/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Choices and Knives]]></title><description><![CDATA[The collective future we want is a future of kitchens, not vending machines.]]></description><link>https://ideas.imbue.com/p/choices-and-knives</link><guid isPermaLink="false">https://ideas.imbue.com/p/choices-and-knives</guid><dc:creator><![CDATA[glenn mcdonald]]></dc:creator><pubDate>Sun, 21 Sep 2025 04:58:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/70c2b8ac-1dbc-412a-b2d1-6ef7bb79ecd2_3600x1890.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This piece was originally published on <a href="https://www.linkedin.com/in/glenn-mcdonald-ab3b36">Glenn</a>&#8217;s blog, <a href="https://furia.com/page.cgi?type=log&amp;id=519">Furia</a>.</em></p><p>The consumptive future that billionaires and their power-consolidation corporations are trying to sell us is a future of vending machines. </p><p>The collective future that we want is a future of kitchens. </p><p>Vending machines present a shallow illusion of overconstrained "choice", but no meaningful agency. The insidiousness of vending machines, however, is not this quantitative overconstraint. Adding more products to the vending machine does not change its nature. Amazon is a very large vending machine. The insidiousness is the conflation of choice with agency, and thus of consumption with participation. </p><p>A vended economy is supported by a complaisant vended politics, increasingly of its own making. Voting is the nominal structural basis of democracy, but encouraging people to vote, in itself, is anodyne and power-friendly. Voting in elections with only one candidate is bad theater, but voting in elections with only two candidates is only one better. And the math never improves: if you have no control over the candidates, and in particular if the candidates are <em>never you</em>, increasing their number doesn't benefit you. </p><p>We don't immediately recognize this as dystopia, because dystopian story-telling usually oversimplifies the format of the oppression. In <em>1984</em> the government controls the meager supply of drab consumer goods, and the single broadcast/surveillance channel. But of course Walmart has endless aisles of things, and the TVs in the TV aisle have endless channels. We have choices. </p><p>But our choices are stocked by Walmart. Or Amazon, or a handful of morally interchangeable competitors. The TV channels are numerous in frequency, but monotonous in signal, and monopolized in control. You can have any innocuous flavor of filler surrounding your advertisements (for innocuously flavored fillers). Orwell thought this gray-goo of a world would be imposed on us by the State, but the capitalist innovation is to invert this. Power is consolidated by money, not vice versa. Citizenship is portioned into voters, who are then repackaged as consumers. Anything functional in government is absorbed or disassembled until it can impose no miserly restrictions. </p><p>And so we get: self-destructive grievant feudalism wielded by a petulant debt-powered narcissist, supported by gutless symbiosis with a solipsistic social class of robber barons. The narcissist only sees himself in the (legacy) media, which is controlled by the same fear-sellers who picked him as their sacrificial agent. Dissent isn't so much crushed as is organized into slots, each of which are manipulated to go temporarily out of stock, and then those empty slots are filled with something more colorful, but more completely owned. A few protesters gather in front of the machine to demand the return of the most recently discontinued snack. Their attention validates the machine. The machine gleams eagerly, its buttons patiently awaiting their fingers. They are angry now, but anger turns into hunger over so little time. Soon they will want something. The machine has the things. It waits. </p><p>We have kitchens. So many of the things in the kitchen came from rows on shelves in stores, vended with only slightly less structure than from the machine. The kitchen is not anti-business or anti-capitalist, exactly. But the things in the kitchen are <em>material</em>, and tools. The difference between a 5lb sack of flour and an individually-packaged snack cake is the difference between potential energy and the bill for energy consumed. At the end, we still eat. The difference is not the overall topology of the system, but our place in it. We come to the kitchen to take up knives, not coins or tokens. </p><p>Instead of a flat grid of processed choices, all lit from a consistent angle, the kitchen is an unruly space. Most of the things in it do fairly little of their own accord. A few of them have very particular purposes, but many do not. A bagel-cutter, but then 4 knives for all other needs combined. One adorable pan you use twice a year to make &#230;belskivers, two sizes of skillet, a saucepan, a soup pot. Inspirational cookbooks you mostly don't actually open. Turn on the stove; grab a pan, put a little olive oil in it; get a knife. We're going to <em>make</em> something out of <em>ingredients</em> and tradition and imagination and love and heat and garlic. </p><p>But even when we do, we are mostly alone. The capitalist rendition of Community Supported Agriculture is a telling example of both potential and challenge. The farmer solicits patrons, who each subscribe to a share of the farm's output, driven conveniently from the farm into the city every week. But while the community's support is collectively tangible to the farmer, albeit not regally so, the community is mostly only abstract and implicit to itself. Maybe you say hi to other people picking up their shares at the same time. Maybe there's a mailing list where you can exchange ideas for what to do with 8 zucchini at once. But there's probably no shared kitchen where you could all make zucchini chocolate cake at once. The <em>community</em> of the CSA, in isolation, is not only asymmetrical, but inherently hard to manifest. Most of your actual neighbors aren't CSA subscribers. Half of them only shop at Trader Joe's and think you're making a gross joke with the thing about zucchini in cake. Some of them shop at the part of Amazon that says Whole Foods, and at least cook. One of them belongs to a different CSA. These fragmented micro-collectives don't worry the billionaires. You are more likely connected to your immediate neighbors by baseball. The billionaires own the baseball teams. </p><p>The "smart" phone, which is now just what "phone" mostly means (in the same way that "social" media is now just what "media" mostly means; and thus a few more billionaires), is sometimes dreamily described as a computer in our pocket, but of course what it really is is a vending machine in our pocket, neatly lined with buttons. Behind the buttons, increasingly, are "apps" that are themselves in turn essentially vending machines of prepackaged choices. Like a tapas dinner with our friends, this doesn't sound bad by definition. But the tapas restaurant has a kitchen, and knives. A decent restaurant is a complex celebration of human agency put into the form of edible performance, and should make us want to cook in the way that, hopefully, a good song makes us want to sing. We can sing while we cook. </p><p>Our computers, even the tiny galley computers in our pockets, can be more like kitchens full of singing. The thing that would make them different would be a different kind of software. But the thing that would make a different kind of software likely is a different economy and a different social structure of not just how software is made, but how computation is applied to human problems. Power must be distributed, but we also have to want it, have to want to make our own decisions instead of delegating them to our choice of 5 omniscient oracles. The oracles aren't going to tell us to figure it out ourselves, so we have to want to not ask them. A CSA doesn't <em>require</em> that its recipients eat differently than TJ's customers, but 8 zucchini constitute a provocation to cook in a way that a frozen stir-fry does not. If the only "ingredients" we can easily buy are frozen entr&#233;es, all meals are snacks and it doesn't matter if our knives are sharp. Especially not if the snack troopers come for our knives, pretending it's for <em>our</em> safety. </p><p>We do not need more applications. We do not need new vendors of fancier and less predictable ways to make the same snack apps. We need the same things for data and data-tools and the things we can make out of data with data-tools that a community needs from a communal kitchen. But because communities formed in computational space can use tools for self-development and self-determination that would be harder to provide in physical space, maybe examples in our software can help lead to similarly catalytic ideas in our cities. Maybe data maker-spaces will inspire us to make communal kitchens, and the kitchens will give us new data needs, and thus new ideas. The spaces are different, but the communities are all made of people, and the people are us. </p><p>We do not need more snacks. We do not need robots that make more snacks. We do not need machines that turn our zucchini into snack cakes that they then confiscate and sell back to us. We need a place where, when we get hungry, it is still <em>harder</em> to reach for a knife than a button, but only in the way that tells us the results will be satisfying. A place in which the vending machine gathers dust until we replace it with an extra refrigerator. A place with noise and joy and knives. </p><p>We do not need safety from ourselves. The knives are not weapons. Stabbing is not a cooking technique. The newly unbillionaired can have some zucchini chocolate cake, too. This is our argument against oligarchy and our restorative consolation to those who thought safety required demonization: We have enough. Dominance is a rich person's poor substitute for collaboration. Aspiring to dominance is a poor person's poor substitute for working together on our collective wealth and taste. </p><p>We do not have to settle for poor choices, bought and swallowed whole. We do not have to buy what we find in machines. We do not have to quietly comply with our own commodification. Together, we do not have to be consumed.</p>]]></content:encoded></item><item><title><![CDATA[From lawless spaces to true liberty: rethinking AI's role in society]]></title><description><![CDATA[Who will actually hold power in the age of intelligent machines?]]></description><link>https://ideas.imbue.com/p/matt-boulos</link><guid isPermaLink="false">https://ideas.imbue.com/p/matt-boulos</guid><dc:creator><![CDATA[Kanjun Qiu]]></dc:creator><pubDate>Tue, 05 Aug 2025 22:56:27 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/170220519/e9a1d6748dc1357f14315ee08fe8b83d.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<div class="pullquote"><p>Welcome back to Generally Intelligent! We&#8217;re excited to relaunch this podcast on Substack, and in video. Our episodes still feature thoughtful conversations on building AI, but with an expanded lens on its economic, societal, political, and human impacts. </p></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://ideas.imbue.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe for conversations with builders and thinkers on what future we want, and how to build the technologies and systems to get us there.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em><a href="http://boulos.ca/">Matt Boulos</a> leads policy and safety at Imbue, where he shapes the responsible development of AI coding tools that make software creation broadly accessible. His work centers on understanding what technological power means for individual liberty and advocates for the legal and institutional frameworks we need to protect our freedom. Matt is a lawyer, computer scientist, and founder.</em></p><p>In this conversation, Matt and Kanjun discuss:</p><ul><li><p>AI&#8217;s four core challenges </p><ol><li><p>Empowering bad actors</p></li><li><p>Transferring power from labor to capital</p></li><li><p>Reducing resistibility</p></li><li><p>Psychic damage of disempowerment</p></li></ol></li></ul><ul><li><p>Governing lawless digital spaces </p></li><li><p>Why abundance is not enough without liberty</p></li><li><p>Freedom as deep enablement <em>and</em> deep protection</p></li><li><p>The role of technologists in shaping society</p></li></ul><div><hr></div><h2>Timestamps</h2><p>03:13 The complex landscape of AI conversations</p><p>06:11 Understanding AI's core challenges</p><p>08:57 The transfer of power from labor to capital</p><p>11:51 Resistibility and human agency</p><p>15:00 The dual nature of technology</p><p>18:01 The invisible dynamics of digital spaces</p><p>23:51 Lawless spaces</p><p>27:01 The future of work and economic stability</p><p>40:05 Privacy laws and digital rights</p><p>44:07 Code as regulator</p><p>54:12 Interoperability and user control</p><p>01:07:05 Aggregates vs. individuals</p><p>01:14:43 Bottom-up vs. top-down automation</p><p>01:20:49 Optimizing for increased ability rather than increased productivity</p><p>01:23:11 Economic implications of AI</p><p>01:26:54 Building systems for empowerment</p><p>01:29:22 Freedom as deep enablement <em>and</em> deep protection</p><div><hr></div><h2>Transcript</h2><p>Kanjun Qiu (00:21)</p><p>Welcome back to Generally Intelligent. My name is Kanjun Qiu. I'm the CEO of Imbue. And we have with us Matt Boulos, our Head of Policy.</p><p>We started this podcast back in 2020 when we were trying to understand from researchers how far this generation of LLMs would go. The podcast has succeeded far beyond what we expected. Many of our early guests went on to have huge impacts on the field, and AI has gone from this niche thing to a household name everyone's talked about.</p><p>But since it's become so ubiquitous, we've started to realize something strange. The conversations in public are really weird. We have one AI CEO saying they're going to replace all of our jobs, but they're distributing intelligence, so that's good. And another AI CEO who's worried that it's going to kill us all, but it'll also give us tutors in India and new medicines, and so that's okay.</p><p>But where is the serious conversation about the real costs and benefits of this technology, the real economic, societal, political, and very human impacts that AI is going to have on our lives?</p><p>Generally Intelligent&#8212;this podcast and this conversation&#8212;is the start of that. We want this to be a space for us to have serious cross-disciplinary conversations about AI so that we can make changes. We can talk about different economic mechanisms, different ways to build technology, so that we can create the future that we want.</p><p>Because today, it's not too late. We can still change how this technology shapes society. And if we wait too many years, that's not going to be the case anymore.</p><p>So let's dive in.</p><p>Matt Boulos (02:45)</p><p>You've put a lot of thought into thinking about what are the core challenges that AI brings. Why don't you walk us through what you think are what you see as the main areas that we need to take seriously if we're going to address AI's impacts?</p><p>Kanjun Qiu (03:44)</p><p>Sometimes it's so overwhelming because people talk about all of these different problems as a whole smorgasbord from sycophancy to how AI might take all of our jobs to how it might take over the world. So, a way that I think about the problems is to bucket them into four categories based on the mechanism of action by which the system is acting.</p><p><strong>One, empowerment of bad actors. </strong>The core mechanism is, the power of actors who might do damage goes up. It's a technology that gives a lot more capability, and now various people who couldn't wield this capability before can.</p><p>And I actually lump both AI takeover&#8212;AI systems taking over and dominating humans&#8212;as well as terrorism in that category, because the mechanism of action is the same. If AI is taking over, that just means that AI is taking a lot of this power and then doing negative things to humans. And same with terrorists or authoritarian governments.</p><p>The reason why it's helpful to think about that mechanism of action is that it's very generative for solutions. When I think about actors who are anti-social, in the solution space, there are a couple of things I can do. One, I can prevent anti-social actors from getting that power. Let's look at which actors exist&#8212;governments, individuals, the AI systems themselves&#8212;and then look at how we can prevent them from getting power. That might be all forms of know your customer laws or safety research or things like that.</p><p>On the flip side, another way to make things more resilient is to make the world safer to bad actions like this. Maybe in that camp is better surveillance of the creation of biological artifacts so that we can prevent viruses, or inventing a universal antiviral that would actually remove a whole class of dangerous problems. I actually think this category is very well talked about often and the important thing is that it is just one category and that many solutions actually solve for many different of these actors and the problems that they pose.</p><p>Kanjun Qiu (06:47)</p><p><strong>The second category I think of as transferring power from labor to capital, </strong>the capital-L Labor to capital-C Capital in the Marx view. As labor becomes less powerful because we are less valuable and capital gains power, what happens?</p><p>Most of us are in the labor class. We do not own the factors of production. We work for wages. And this is a technology that&#8217;s starting to do things that we currently do wage work for. So what happens to all of us who work for wages?</p><p>There's the immediate, somewhat alarming effect of that: losing jobs. But there's, to me, the long-term, somewhat alarming effect of this, which is that you have this constant power transfer from labor to capital that is forever.</p><p>Matt Boulos (07:51)</p><p>There is something really quite striking if the ability to be productive depends on capital. This is a really abstract way of saying, I show up to work and the capital I&#8217;m bringing is my laptop, but for the most part, I'm bringing the labor. Imagining this world, maybe the day's gonna come that the laptop matters way more than I do, and it&#8217;s a question of who owns it.</p><p>Kanjun Qiu (08:41)</p><p>That's a really good way of putting it: the transfer of power from labor to capital is equivalent to the transfer of usefulness from me to my laptop. So what happens in a world where the laptop's way more useful than I am?</p><p>Matt Boulos (08:54)</p><p>I've never looked suspiciously at this thing before.</p><p>Kanjun Qiu (08:57)</p><p>What happens in that world is not just economic. It's not just that I get paid less. Maybe I wield my laptop so therefore I still continue to get paid some. But in theory, the company owns my laptop, so I may not get paid at all.</p><p><strong>But the second effect of it is political. </strong>Part of the reason why we have political power is because our government depends on us to fund it. And there are a lot of countries that don't depend on humans. They depend on natural resources like natural gas or oil: the UAE, Russia. And they have a lot less incentive to treat their people well in the same way as we maybe in America do. So I'm somewhat concerned about the kind of loss of political power that we'll have because of our loss of economic power.</p><p>Capital can now just use capital&#8212;use AI&#8212;to produce more capital, and there's this reinforcing loop.</p><p>Kanjun Qiu (09:59)</p><p><strong>The third category, which I haven't heard that many people talk about, is your idea of resistibility. </strong>In political philosophy, there's this idea of resistibility: how well can you resist laws that don't serve you?</p><p>In America, we have fairly high resistibility. The civil rights movement was a good example of that, where you could actually disobey, have civil disobedience, and then change the laws. There are countries that have very low resistibility, like China as a surveillance state. And one thing that we're concerned about is going into a future where the resistibility of humans against automated systems&#8212;either controlled by themselves or controlled by other people&#8212;is much lower. We lose our power. So, the core mechanism here is a transfer of power from people to automated systems and the people who control them.</p><p>There are a lot of examples of low resistibility today. For example, we have very little ability to resist our social media notifications. We can turn them off, but we also have very little ability to resist our social media algorithms or news algorithms or control the news that we see, that we want to see. There are ways of opting out, but I would consider it a fairly low resistibility environment.</p><p>And as we go into a future that has a lot more automated systems&#8212;agents that are doing things automatically&#8212;that's something that's really important to consider. Now, other people are going to have agents that do things like spam call you constantly, or try to convince you on a website to buy something you don't need, or try to convince you to give them data that they can resell. Especially given what we see about current capabilities, it&#8217;s not clear that we have anything in place that addresses that.</p><p><strong>The fourth category is how it affects us as people to live in a society where we don't have very much power</strong>&#8212;we don't have power economically; we don't have power to resist things. We end up disempowered and, in the best case, infantilized.</p><p>That is scary because there is a deep sense of learned helplessness that happens as we lose power. There's a great <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=702463#:~:text=I%20find%20large%2C%20negative%20wage,Labor%20supply%20is%20unaffected.">study</a> by <a href="https://sites.google.com/site/lbkahn?pli=1">Lisa Kahn</a> about how college grads who graduate in a recession have lower wages for the rest of their lives relative to college grads who graduate just a year or two after, which is super crazy. You term this &#8220;psychic damage,&#8221; which I really like: damaging our own perception of how capable we are and can be in the world. And I think this is really sad. There's this spiritual damage that we don't talk about, which is about human potential, about what people can be. What we want is for AI to expand our potential and expand what humans and humanity can be, but there are all of these effects that, in this default path that we're on, seems like it's going to go against that.</p><p>Matt Boulos (13:21)</p><p>I want to run with the last thing you said because... I guess I have to come out with this. I'm old enough that I have a memory of my life that's pre-internet.</p><p>Kanjun</p><p>That is old.</p><p>Matt:</p><p>I remember being in grade school and Mr. Gen, the computer teacher, knew that I was really bored, so he would pull me out of class and then we would pretend that I was learning, but we were just trying out new software. One day he's like, &#8220;Let me tell you about the thing called bulletin board systems. There's somebody else on this computer.&#8221; I&#8217;m like, &#8220;Where are they?&#8221; And he's like, &#8220;They're in another country!&#8221; It was wild and so hopeful.</p><p>I think often these days about my parents calling their parents after they immigrated to Canada. They'd get these crap calling cards that would cut out and were choppy and super expensive. Now, I FaceTime my mom and she's like, &#8220;I have to go now.&#8221; I'm like, &#8220;how about we hang on for a little while longer?&#8221; We're taking for granted the ability to see each other, hear each other.</p><p>So you have all this incredible potential, and it's beautiful, and it's real. Our machines can augment us and we love tools. I don't understand why you could say, &#8220;I love my pen that I write with, but I don't like my laptop.&#8221;</p><p>But at the same time, we know that technology has been this very mixed force in our lives, from the capacity to surveil people to predatory mechanisms around how we communicate. One of the things that I have felt is really important to bring to the conversation is that talking about AI as good or bad is almost silly. It's like saying trees are good or bad. You plant it right next to your foundation, that was a bad move; you step out into a clean city, and it's the most wonderful thing. There are, of course, limits to that analogy, but there is something really profound about taking the complexity seriously.</p><p>Kanjun Qiu (15:47)</p><p>I was really struck by a quote that you said many years ago where you had just read something by <a href="https://en.wikipedia.org/wiki/Marshall_McLuhan">Marshall McLuhan</a>, and you rephrased what he said: that we adopt technology for its benefits and then we suffer its consequences. So, the important thing is to think about those consequences that we might suffer and see if we can get more of the benefits and less of the suffering.</p><p>Matt Boulos (16:17)</p><p><strong>What was brilliant about McLuhan was that he'd clued into this dynamic where we adopt a technology and that it changes how we work and interact and our own capabilities so that it is no longer possible to detach and disentangle.</strong></p><p>Kanjun Qiu (16:35)</p><p>We&#8217;re part of it; it's part of us.</p><p>Matt Boulos</p><p>Take something like social media. The public narrative on it is actually condescending and not correct. We're not just a bunch of dumb-dumbs who are sitting around swiping things because we don't have anything better to do&#8212;or at least not completely. The thing that's really happening is that our lives, our social lives, are on here. I get to see my friends' kids. I'm able to get a diet of things that matter to me or that entertain me. There's nothing at all wrong with that. We don't bear responsibility for the fact that these things are wildly addictive. But if our lives moved onto these platforms, then we're now in this really stuck position because if the platforms don't behave, then we are subject to that.</p><p>When I started law school, I decided to run an experiment. I had no phone, no computer; I had nothing. I'm like, &#8220;I'll be a nerdy monk and see how this one goes.&#8221; And one of the interesting things that happened was that I realized that nobody wants to call your landline to invite you to parties. Everyone wanted me to go to the parties, but they'd tell me about it the day after. They were like, &#8220;you weren't there!&#8221; &#8220;Well, you didn't invite me!&#8221; They're like, &#8220;oh, yeah.&#8221; Everyone was texting each other, and it was a simple thing. You can't just exit. That was significant. And then I got a phone.</p><p>Kanjun Qiu (18:01)</p><p>This is really important. You can't exit technology. We can't just be like the Amish because this technology is now so prevalent and so entwined in our social interactions and our lives.</p><p>Matt Boulos (18:19)</p><p>Let me give you the silliest example. My son is in a preschool that has a bit of Mandarin immersion, and I know no Mandarin at all. He's now singing &#8220;b&#225; l&#466; b&#242;,&#8221; and it turns out there's this children's song about pulling radishes out of the ground. My son&#8217;s just marching around the house singing this song, and so we're like, &#8220;what's going on? Why did we enroll him in a language school that we don't understand?&#8221;</p><p>What do we do? We just hopped on an LLM and said, &#8220;okay, my son is singing this, can you tell me what this is?&#8221; All of a sudden, that world opens up, and it's beautiful. I want to challenge the idea that that is us conceding something. Why should I be trying not to do that?</p><p>Kanjun Qiu (18:56)</p><p>This is to something you said earlier, which is, when we get on social media platforms it&#8217;s very positive. But I was saying this earlier about resistibility, where it's very hard to resist these systems. What's going on here? It is something about the power inherent in technology.</p><p>Fundamentally, what is AI? It is doing computation. Computation is the same computation that our brains are doing. It's taking inputs, perceiving them, running them through some model of the world, outputting some things, and those outputs can be turned into actions.</p><p>There's something about social media where it is an AI agent. It's making decisions about what actions are being outputted, like what I see on my newsfeed. And as a result, I get different inputs.</p><p>So, I'm getting this other input or information that changes my model of the world. To your point about this symbiosis, the technology is making some decisions that is causing me to get different inputs into my world model and now my world model is getting morphed or transformed in a different direction. And that can be very positive. For example, you learn about your son and the song that he's singing and that expands your world model and helps you see reality more clearly. Perhaps the areas that I feel concerned about are the places where it actually kind of changes the way you see reality in a way that is more twisted.</p><p>Matt Boulos (21:00)</p><p>It feels manipulative. But I want to take a step back, because I've been like, foam finger, <em>go technology, go</em>. But this point about technology and power, thinking about its mechanisms are really important. Going back to my son singing his song: I went to this machine and asked it a question and it came back with an answer that I couldn't have gotten five years ago, or at least not so easily, not so smoothly.</p><p>But we're only talking about one part of that transaction. We're talking about me asking that and getting an answer. We're not talking about: Is this being logged? Does this system now know that my son is in a Mandarin immersion class? Are we gonna get like Mandarin worksheets offered up to us at the next interaction?</p><p>There's something about this digitally mediated world that is foreign and dangerous. And I think that's worth probing into.</p><p>Kanjun Qiu (22:11)</p><p>What do you think it is?</p><p>Matt Boulos (22:13)</p><p>I have a couple of different mental models. Let me play with two.</p><p>One is something that I call <strong>lawless spaces.</strong> Imagine a part of town where the police don't go. The rules, the norms&#8212;people are working them out, but they're not governed like the rest. And you cross that threshold into those places. Soon, things become possible. Maybe there's just a free spirit. It feels like the chaotic early days of the internet: people who relished in anonymity, not because they were up to trouble, but because there was something liberating in that. You can imagine that creating a heady atmosphere.</p><p>Kanjun Qiu (22:59)</p><p>Like the Wild West.</p><p>Matt Boulos (23:10)</p><p>Exactly. Your bank is like, &#8220;why shouldn't I be there too?&#8221; So your bank sets up shop, except when you step out of the bank holding a bag of coins, someone whacks you over the head and takes your coins. That space doesn't have the same rules, the same governance. It's not a perfect analogy, but there's a lot to be said for that.</p><p>I was talking to somebody about privacy and they're like, &#8220;That ship sailed.&#8221; And I said, &#8220;Well, why?&#8221; If we were having dinner and some dude comes up to you and just stands right next to you while you're talking and he's just writing down what you're saying, you give him a slap, send him out of the restaurant, right? We have that but we don't see it in the digital context, so we haven't learned to govern that.</p><p>Kanjun Qiu (23:47)</p><p>Do you think digital spaces are lawless because they're not visible?</p><p>Matt Boulos (23:51)</p><p>I think that's a huge part of it. The next thing I want to talk about is, what are the things that make the digital space really particular? One is that most of what happens in them is actually invisible to us. If I go to the neighborhood oracle and give them 10 bucks and say, &#8220;My son is singing this song, can you tell me what this is?&#8221;</p><p>He's gonna sit there, like, &#8220;Oh man, I know what it is!&#8221; He's gonna grunt and groan, write something down, and send me out. I go to an LLM and it&#8217;s a magic box. You and I may know how in theory it should work, but we don't actually know how it's implemented. We don't know what a year from now, five years from now is gonna happen.</p><p>Kanjun Qiu (24:44)</p><p>And you don't know what's being logged; you don't know what the company is doing with the data. There's a lot you can't see.</p><p>Matt Boulos (24:52)</p><p>There are really particular characteristics to just the digital world. It is easier to log than to not log.</p><p>Kanjun Qiu (25:00)</p><p>And it's safer in a lot of ways.</p><p>Matt Boulos</p><p>And there's an expectation. If something doesn't work, the customer is like, &#8220;I did X and it didn't work.&#8221; And you're like, &#8220;I have no idea what you did, I have no logs.&#8221; And they're like, &#8220;What are you doing? Are you junior grade developers?&#8221;</p><p>It is easy to log data. It is cheap to collect data. It is lucrative to collect data. Even before advanced models like LLMs, we could crunch through stupidly large amounts of data. So you have these reinforcing mechanisms that take us to really perverse outcomes.</p><p>We talk about surveillance. Surveillance enables an astonishing amount of bad stuff. Resistibility is something that you reach for when you're in conflict: something has gone wrong, and you have to resist it. But preceding that is legibility: do you even know who I am?</p><p>Kanjun Qiu (26:01)</p><p>This prompts for me: let's imagine we were making a world in which we're making everything digital into a physical manifestation. What I'm hearing you say is, we have ended up in this really weird default digital world, especially going into this AI future. It's weird because some defaults are weird. One default is that we log all data. A second default is that companies&#8212;I, running this company&#8212;can process that data however I want. Another weird thing is that now we have these AI systems and they can do lots of new magical things with that data. For example, take a photo and know where I am. As the person at the other end, I can't see any of it. And as a result, I actually don't have a mental model of it being a problem at all.</p><p>Matt Boulos (27:01)</p><p>We also didn't have an emotional response. We&#8217;re wired as human beings to recognize these things, and we can't react.</p><p>Kanjun Qiu (27:08)</p><p>We talk about something like surveillance and it's such an abstract concept. But if you were to turn surveillance into a physical manifestation, like this guy who is writing everything down in your conversation next to your table, then it would be like we have five people following us around everywhere and they're all logging different things about our lives and they're changing other stuff in our life based on what they're logging.</p><p>Matt Boulos (27:29)</p><p>This is where it starts to get wild because on one track, often when people are talking about the productivity benefits of AI and the labor impact, we're talking about labor substitution. But there's another way of thinking about the impact of AI within the labor context, which is that new work is being created. Let's take something like credit scores. Largely opaque systems, the financial services industry benefits from it, and the good faith argument is that we all benefit. If I'm an untrustworthy borrower, you shouldn't have to be paying rates to subsidize me, so we stratify on the basis of reliability or whatever terms they use to describe it, like credit worthiness.</p><p>But then you could start to shift the granularity of that. We could just collect all sorts of stuff. We could also experiment. We could collect data even if we&#8217;re not sure if it's relevant. Deny me a loan&#8212;who cares? I'm one data point. My life gets crushed, but they don't know about it because their system did it, and just move on and experiment. That dynamism that becomes possible is going to be potentially quite pernicious.</p><p>Kanjun Qiu (28:55)</p><p>When you say dynamism, what do you mean?</p><p>Matt Boulos (28:58)</p><p>You could have systems that are not stable anymore. There isn't a credit score. There's an algorithm that's constantly rewriting the rules. Why not? As long as it&#8217;s goal seeking against minimizing defaults, it doesn't matter how unfair it is. Often when we talk about unfairness&#8212;putting on my lawyer hat&#8212;we often talk about things like disparate impact, protected categories, that sort of thing. But what happens when it's arbitrary, what happens when it's large categories of society, what happens when it's not easily pinpointed? Again, the bad stuff is happening behind the veil, so we don't know.</p><p>I want to connect that to something earlier that you were talking about, when you were talking about the economic impacts, and you were saying, that destabilizes society. But also, when you live in a world where you are subject to all of these forces and you're helpless against them, it's not good for a person to feel that way. Think about the worst parts of childhood where adults are not taking you seriously, not letting you do something that you ought to be able to do, and then that becomes the dominant mode of adult life.</p><p>Kanjun Qiu (30:18)</p><p>It&#8217;s very disempowering.</p><p>Matt Boulos (30:22)</p><p>And this is preceding oppression. That by itself is destructive. And then you add to that malicious intent or malicious oversight, and it isn't a surprise that we live in an angry moment in our society. I don't have a lot of patience with the tech community sort of sitting around saying, &#8220;How could this be?&#8221; Well, I mean, you've been bloody architecting it for the last two decades. There is a reason why people feel disempowered because they are disempowered.</p><p>Kanjun Qiu (30:53)</p><p>They have no power to change a lot of things.</p><p>Matt Boulos (30:56)</p><p>How could you change any of these things? The thing with AI&#8212;and I think it's really important that we ground it&#8212;is that we have to recognize that all of these dynamics are in play. Then, we can ask how do you design, and how do you get to empowerment? Because we could also just sit here and be angry and walk away, but that's not going to help.</p><p>Kanjun Qiu (31:12)</p><p>Two things that came up as you talk about this. One is, <strong>narratives today about what kind of future is okay for humans.</strong> I think a lot of the futures that the tech industry talks about today are actually very disempowered. One type of future is like, we're going to live in a stable utopia where everyone's going to have anything at their fingertips and it's going to be okay. But it does not consider seriously these dynamics where people are being controlled by technology and the people who control technology.</p><p>A second thing that you pointed at was this notion of utopia being like &#8220;permanent undergrad&#8221; where you can be free and intellectually curious and it's really fun. But an undergrad is not an adult with the ability to fully manage their lives.</p><p><strong>The kind of freedom that you're going for is for humans to be truly able to be fully adult and in the world themselves without being pushed upon by other forces, with the ability to push against those forces.</strong></p><p>Matt Boulos (32:42)</p><p>Absolutely. What do we really want from our lives? It&#8217;s to be able to realize our capacities.</p><p>Kanjun Qiu (32:56)</p><p>And that involves growth and change and creation and being pushed down.</p><p>Matt Boulos (33:01)</p><p>Absolutely, and having a chance in all of it. One of the things that I've noticed within&#8212; again, not to take a piss on the tech community &#8212; but we'll talk about what is an ideal future, or what's an ideal life for someone to have, and that's just somebody projecting what they thought was interesting to them.</p><p>I have so many people in my life for whom the specifics of their job don't actually matter that much, as long as they can take care of their family and support their community in ways that are really meaningful to them. Those are rich, beautiful lives. And when the structures around a person erode that is when we start to see this real frustration emerge.</p><p>Kanjun Qiu (33:53)</p><p><strong>People are frustrated because they feel like they don't have any levers to change the situation of their lives, and they don't like the situation they're in even though the world is abundant and they're fed. </strong>There's something missing about their sense of autonomy or freedom or their ability to make change. What I heard from lawless spaces is, it's partially a lack of legibility and partially a lack of levers of action. And if you had legibility for everyone and levers of action for everyone to be able to change their life circumstances, the institutions that aren't serving them, then maybe those two things would allow us to be able to have a little bit more autonomy and self-determination in our lives.</p><p>Matt Boulos (34:48)</p><p>It's kind of hard in our present moment to think about what a stable political or legal regime looks like in general, but there is a simple fact that we've, for centuries now, have figured out that it's not cool to steal somebody's money. It's not just that theft is wrong, but that the state can't do it, even if it's useful to the state. We say that's not right.</p><p>In conservative circles people talk a lot about debanking where just banks just turn you off digitally. It's not a frequent occurrence, but it happens, and has happened in response to political events. What's wild to me about it is that it just could not have been a thing a few decades ago; the bank would have to have literally stolen your money.</p><p>Kanjun Qiu (35:54)</p><p>Like a bank run.</p><p>Matt Boulos (35:56)</p><p>Or simply, you'd go to your bank branch and they&#8217;d be like, &#8220;we're not going to give you your money,&#8221; which is what debanking looks like. And you would say, &#8220;you stole my money!&#8221; Whereas debanking now is either just hitting a switch and they can access your money, or just saying, &#8220;here's your money, you're out of the financial system&#8221; in a way that is only possible in a digital world.</p><p>I have one belief that our laws and rules haven't caught up to digital reality, then AI accelerates digital reality to all of its conclusions.</p><p>Kanjun Qiu (36:32)</p><p>What I hear from what you're saying is, the digital world is enabling all these mechanisms like being able to turn off my access to my funds and the laws haven't caught up. The last 2,000 years of development in the legal system have been about physical reality.</p><p>That physical reality is actually happening in the digital world. You're giving all these physical analogs to it that are really interesting because they let us see the physical reality of what's happening, but somehow we haven't mapped that physical reality to the digital world. What would be required to make lawless spaces more lawful? Why have we not caught up? Is it a lack of knowledge? Is it the lack of visceral sense of what's going on?</p><p>Matt Boulos (37:37)</p><p>Each of the things you said feels to me like it's playing a part. We both understand computers really well, but when I hop on a website, it is not occurring to me that they are tracking the things that I'm doing.</p><p>Kanjun Qiu (38:05)</p><p>True! The other day I hit &#8216;accept cookies&#8217; and then I was like, &#8220;what happens when I accept cookies? Oh shit, it can track me across multiple websites &#8212; that's crazy!&#8221;</p><p>Matt Boulos (38:13)</p><p>I drive people nuts when they look over my shoulder, because I always not only reject cookies, but I open the thing to make a point of deselecting everything. And the hilarity is, often these are just pop-ups that don't do anything and it just collects your data anyway.</p><p>Kanjun Qiu</p><p>That's kind of depressing. Thanks.</p><p>Matt Boulos</p><p>Yeah, it really is. You're welcome.</p><p>There's one sense that it's not tangible, no matter how sophisticated you are.</p><p>The other thing is, it is new. In world historical terms, we're talking about living in this regime for 10 years. It is not that long, right? Google trying to figure out how to monetize was something that happened basically in our adulthood. That's nuts. And then going from web to mobile, the introduction of apps &#8212; all this has happened really, really fast. Part of it is we haven't caught up.</p><p>The other somewhat more cynical thing is that, it turns out lawless spaces are awesome because they're so lucrative. If you can do stuff like surveil people and track them and price fix and all of the rest, you can do all sorts of astonishing things.</p><p>If you deal with it now, it's a lot easier than down the line. One easy answer is a good privacy law. We had good privacy rules 10-15 years ago. It wouldn't be so painful for the large tech platforms to unwind these privacy practices.</p><p>Kanjun Qiu (39:49)</p><p>But now it's entrenched. You have to change your entire infrastructure.</p><p>Matt Boulos (40:08)</p><p>We're talking infrastructure, business models, identity as an entity, and the market cap of these things. I don't want to grant sympathy to the surveillance practices, but this is a huge thing that we're going to have to ask of them. But we do have to ask it.</p><p>But there is an interesting question of what rights do we have that we have failed to translate, just as a practical matter? We already have these legal rights and we haven't brought them to these spaces. And then what are the new things that we have to figure out?</p><p><a href="https://hls.harvard.edu/faculty/lawrence-lessig/">Larry Lessig</a>'s notion of <a href="https://www.harvardmagazine.com/2000/01/code-is-law-html">code as regulator</a> is really fun. What he does in this setup is that he points out that in every period of time, there's some regulating force that you have to contain if you want to protect liberty. In his construction, one that I share, we're progressively trying to increase liberty as a society. But he points out that in the time of John Stuart Mill, you were worried about majority opinion &#8212; democratic opinion &#8212; but it can trounce minorities. So then we start to establish the notion of rights, and constitutions become vital to that, because if you just leave it to the majority, then that's actually sometimes not great. Then you have the Civil Rights Act, and suffrage movements, and so on.</p><p>What he was pointing out that I thought is really interesting is that the new thing is gonna be code. Code is going to operate &#8212; this was in 2000 that he wrote this piece &#8212; as regulator. And the argument there is that&#8230;</p><p>Kanjun Qiu (41:49)</p><p>Code is encoding laws.</p><p>Matt Boulos (41:50)</p><p>Yeah, code is going to determine how a sphere of life is going to play out. So then the question we now need in response to that is, what things in that space need to be addressed?</p><p>Kanjun Qiu (41:57)</p><p>I have this hypothesis that technology shapes our governance system &#8212; the way that technology is built and what makes it powerful. There's this theory that the reason democracy happened &#8212; I'm sure this is just one of many reasons &#8212; was because we went from a world where knights were the most powerful thing to a world where muskets were the most powerful thing. When you have knights, you have a lot of upfront investment in armor, you have to have horses and stables and all these well-trained people. That's a very centralized form of power. Technologies at that time resulted in this centralization because of the nature of those war technologies.</p><p>Then the musket was invented and now, knights and armor are not that useful. In fact, you actually want a lot of people who have muskets. So now people matter because of this new war technology that gives power to people.</p><p>We talk a lot about how AI, and the core four problems that I talked about, are fundamentally about power and transfers of power from one entity to another entity. We call it problematic when it gives power to entities that are not what we've determined to be morally right. In that lens, thinking about lawless spaces and what this upcoming technology is starting to enable, is there a nature to AI that shifts things one way or another?</p><p>Matt Boulos (44:07)</p><p>I have two responses. One is, there's also just law as law. What is it about this moment that we leave ungoverned? I find a lot of these free market arguments, the accelerationist camp, is essentially bullshit. All you're saying is just, we don't want regulation. So let's just say that. There's nothing else there; it isn&#8217;t a richer argument.</p><p>Kanjun Qiu (44:12)</p><p>Because lawless spaces are great.</p><p>Matt Boulos (44:39)</p><p>Lawless spaces are lucrative. They do yield huge amounts of opportunity. I'm not saying let's clamp down &#8212; that's how you shut everything down. It often does not make sense to intervene. It also does not make sense to intervene before you understand a space, because then you will have spent your political capital.</p><p>Think of even American politics, with all this craziness right now. There is political capital that can move you towards some privacy bill or things like that. And if you do the wrong thing, that capital's not waiting for you to go do it again. So you have to be disciplined about that.</p><p>But at the same time, you can't just say no rules. Or if you do, then that's ideologically encoded and you ought to own the rest of your argument.</p><p>Kanjun Qiu (45:20)</p><p>If there are no rules, we're buying into a particular society.</p><p>Matt Boulos (45:23)</p><p>And do we want that? Is that a fair thing to ask of others? If you want to impose that, then you should also expect resistance to it.</p><p>Kanjun Qiu (45:30)</p><p>&#8216;Law as law&#8217; is interesting because it actually argues against my argument that technology shapes society in this fundamental way. Maybe what you're saying is you could make laws that change that distribution of power.</p><p>Matt Boulos (45:44)</p><p>Something could be wrong, and whether or not the temptation to that wrong thing is great, it's still wrong. But then, if you don't want to eat the muffin, don't put it in front of you. And we have both.<strong> Law needs to set the boundaries of what's acceptable or unacceptable, regardless of what the temptations are. But the nature of the technology is going to shape those temptations. Back to the point about how surveillance is the easier default model.</strong></p><p>So when it comes to what we do with these technologies&#8230;</p><p>Kanjun Qiu (46:15)</p><p>It makes some things easier than others.</p><p>Matt Boulos (46:17)</p><p>Yes, absolutely. A perfect example is going to be something around labor. And I want to bait you into this conversation. Labor impacts are going to be real. We don't even know what those are going to look like. There will be things that employers and companies can and can't do and shouldn&#8217;t do. Right now, we know the power a company has over, for instance, a warehouse worker that has their work determined by an algorithm. It's also worth pointing out that they don't have a capricious boss who can be an asshole and make their life hell. The algorithm is governing things both good and bad. But do we then say &#8220;this is the shape of the technology&#8221; and back away? Or do we recognize that this starts to introduce things that weren't possible before and we need different rights and rules?</p><p><strong>Most of our labor laws are predicated on humans interacting with other humans &#8212; more powerful humans, but they're human interactions. Whereas a machine can surveil your every motion and then dock your pay for scratching your nose at the 15 minute mark. And we don't really have mechanisms for that, because we couldn't have conceived of that as being an active problem. It would have been nonsensical to have rules.</strong></p><p>Kanjun Qiu (47:37)</p><p>This is very interesting because it speaks to actors in the world and the power that they have, and this new actor which is an algorithm or an AI agent. Like what you're saying is, right now we have laws and they govern your capricious boss, they govern you, they govern your corporation which is considered an actor, legally. So the only actors we have are humans and human institutions in the world before AI.</p><p>We have laws that limit the power of humans to harm each other and we have laws that limit the power of corporations to harm humans and vice versa. But now there's this rise of this new power, which is AI systems. AI systems have power because they can process information and turn information into action and action is power. Effective action is power.</p><p>To the extent to which an algorithm can govern what I am allowed to do as a warehouse worker, that is power that the algorithm has. Now you're saying, okay, we have this new power. What do we do with it? We're not doing anything with it.</p><p>Matt Boulos (48:54)</p><p>Societal norms will change it, our behaviors will change it, the technology itself will change, and therefore that power will morph. It's just so odd to me that you say, okay, then we're done. We've never done that in human history.</p><p>Kanjun Qiu (49:12)</p><p>We need to figure out what to do with this power. It might be partly because this is the first time a technology is its own power in a way. We've never had technologies in the past that make decisions.</p><p>Matt Boulos (49:26)</p><p>Not to dunk on people who are trying to do good work, but a great disservice was done by the AI safety community on this point. By talking about runaway systems as much as they did, they created this special category of worry, this incredibly low probability event, and we don't actually know what its dynamics are going to look like. Whereas this notion that systems can make their own decisions, but they're doing it for someone. You don't go spend millions of dollars to develop a system and you're like, I will let it go. You're doing it to manipulate the stuffing out of your viewers so you can sell more ads to people to buy flip-flops so you get your cut on the ads, and on. And across the board, in every domain that these autonomous systems are going to function, they're going to do so for a purpose, for an owner, a controller. When we talk about them being autonomous, it is about the ability to delegate to systems.</p><p>Kanjun Qiu (50:41)</p><p>It's the ability to delegate human power to systems to encode that power. I as a manager can now encode my power in a system.</p><p>Matt Boulos (50:51)</p><p>That's right. And that is an astonishing amount of power, the multiplicative: the fact that you could do so at massive scale, you can do so quickly, and it can adapt. And then one of the things that I, in this new power thesis, argue is that when that happens, it is very hard as a human on the other side of that to know how the decision was being made and so there is a default to accept that.</p><p>Kanjun Qiu (51:22)</p><p>And they have no levers over the decision at all. No legibility, no levers.</p><p>Matt Boulos (51:26)</p><p>Exactly. <strong>You have exactly no window into what is going on, no means of recourse. And as more and more of these sorts of things happen, we'll feel very powerless. It's an incredibly sad example, but in the context of war, this is what we are seeing. We are seeing, particularly in the Middle East, an example of AI systems doing the targeting.</strong></p><p><strong>People have not classified this as autonomous systems gone amok because humans built a system for that purpose. Yet, when we talk about, we're worried that AI systems will kill people &#8212; they are killing people. Explicitly, they are killing people, and they're being designed to do that. And let's be honest, when people are talking about national security implications for AI, yes, you're talking about economic competitiveness, but also you're talking about the fact that you want to have AI systems that can do that.</strong></p><p>The ultimate act of power is to take someone's life. We already have that extreme happening right now and being realized. But it's the same dynamic where a human is delegating or sets of humans are delegating the thing that they want done to a system, and the system can carry that out. Because it is a system carrying it out, the context and the entire execution of that looks completely different. Where is the appeal, where is the chance to challenge it, where is saying that's wrong, where is the record? Where is even the idea of knowing how that decision was being made?</p><p>In my day to day, when I'm using AI systems, it's fun or productive. I don't care how it came to the decision. I'm just like, is this right, can I work with this?</p><p>My primary LLM use right now is trying to count calories. So I take a photo of what I ate and then I try to negotiate with it to lower the calories so I could eat more food.</p><p>Kanjun Qiu (53:19)</p><p>There's actually a huge difference here. This calorie counter is an AI system that is under your control that you're using to serve you. The war system that you're talking about is a system under one person's control that's being used to control someone else or harm someone else. Those two things are two different types of systems. You might argue that actually what we want is more systems under our control that affect us, and ideally don't affect other people too much.</p><p>Matt Boulos (53:51)</p><p>Imagine if my calorie counter determined what I could eat.</p><p>Kanjun Qiu (53:53)</p><p>Then it would be controlling you.</p><p>Matt Boulos</p><p>It would be awful. It's not perfect, and sometimes it goes completely off the rails in either direction, and that's fine-ish because it's within my domain. It's an irritation; it's not a risk.</p><p>Kanjun Qiu</p><p>This is something I've been thinking about with our product. We try to make systems that allow people to make software. I often talk about open software or an open software commons or malleable software &#8212; the fact that software should be built to be modified by the end user. A lot of people are like, &#8220;Who cares? I don't want to modify my software. I'm perfectly well served by my software. There's no problem, except sometimes.&#8221; And I realized the core idea is not that the software should be built to be modified. That's an instrumental thing. Instead, it's that software should not control me, ever.</p><p>Matt Boulos (54:53)</p><p>People might say that they don't want to change things, but often that's because the decision space has been so narrowed for them. One of the things that's really interesting to me as we work on interoperability, and as we're rallying a community around this, is how many startups just never got to a place where they could fight for interoperability because their mere existence would not be feasible in the current regime.</p><p>Kanjun Qiu (55:22)</p><p>Talk more about interoperability.</p><p>Matt Boulos (55:24)</p><p><strong>One of the main things that we're championing and pushing for is interoperability legislation. </strong>The idea, at its simplest, is that a platform should not be able to discriminate on the basis of how you access your own data and services that you use.</p><p>Kanjun Qiu (55:45)</p><p>You should be able to get your data and have it be yours.</p><p>Matt Boulos (55:47)</p><p>Yes, and you should be able to use a tool of your choosing to interact with another system. Just as you could go buy bananas, or you say, &#8220;hey Matt, can you go get me bananas from the supermarket?&#8221; You couldn't have a supermarket saying, &#8220;no, only Kanjun,&#8221; right? And yet, that's our online world.</p><p>Kanjun Qiu (56:08)</p><p>Let&#8217;s make it concrete. LinkedIn says, I'm not allowed to use someone else's account to use LinkedIn. I can't use a bot, I can&#8217;t use Tweetdeck. It's monopolistic.</p><p>Matt Boulos (56:22)</p><p>Exactly. And the platforms do this for good reason. It consolidates their control around the points of input and access, but the consequence of that is pretty severe. The two things that are happening is one, we are moving towards a world in which these AI systems are going to be more and more useful, so we are going to share more and more data. We don't even have any indication that these things might not be handling our data soundly, so we're going to talk to them. I'm going to say, I'm injured or I'm sick, can you please go make an appointment for me? And we don't know whether that data is going to be held with any sort of responsibility or not.</p><p>The other is that there are all of these wonderful things that could be built if I could just access my digital life. <strong>What interoperability does is it gets two really critical birds with one stone, which is one, if I can access my own data, then I can decide where it goes. I can control that, I can check up on it. But the second and more critical is that if it's possible to build software that interacts with my richer digital life, then I'm not attached to these parasitic platforms and agents and we can build alternatives. You can seed a whole other tech ecosystem around the idea that we're in charge, it's our data.</strong></p><p>Kanjun Qiu (57:44)</p><p>It's our software, it's our data. We make it. And we can sometimes interact with these platforms, but we can use our own interfaces.</p><p>It is becoming possible that we can make our own software, and make our own wrappers or systems that access Twitter data and download it. Then I can make my own algorithm and process it in a different way, so I can get just my friends and I can derank inflammatory stuff. That's just starting to become possible.</p><p>The software that exists today, because it's so expensive to produce, is incentivized to make that money back. Not because the creators are bad, but that's the incentive structure. As a result, it's either selling to us, or it's selling us to something. Those are the two options. Then occasionally, you have someone who's incredibly generous who makes software for free. It really feels like it should be flipped on its head, that most software that exists &#8212; AI systems that exist, we lump it all in the same category &#8212; should be software that is serving us, not selling us things or selling us to things. And it is ours. And that it doesn't require enormous acts of generosity to create software that doesn't do that, that is just for us or for other people. It should be easy. It should be what the default world is.</p><p>Matt Boulos (59:21)</p><p>If you think of deep, rich, sustaining communities, they're very generative, they're very productive. If you think of the art that emerged from religious communities and the invention of different structures, the social structures, different aid structures &#8212; it is a peculiarity. And I wonder if it is also a peculiarity of just how young software is in world historical terms. We're talking just a couple of decades in which you have the prevalence of software. But the point you're making about the fact that the cost to make software will go down, and the stuff that we'll make will start to look different.</p><p>Kanjun Qiu (1:00:02)</p><p>It could, if you get things like being able to access all of your data. Network effects are real. Right now, these big platforms have network effects. I can't just move to another social media platform and be able to interact with all of my friends. That sucks. I can't move off of Uber or Airbnb marketplaces and social media platforms. And no matter how cheap software gets to make, network effects are still there.</p><p>Matt Boulos (1:00:08)</p><p>To your point about the cost to make software, one analogy, and I know it's not perfect, is like in manufacturing, you spend a lot of money to make a mold. So if you're making plastic chairs, you spend maybe a couple million dollars to make that mold, and then you make as many $5 chairs as you possibly can off the mold for it to pay back. There are different analogies we can use to describe what's happening.</p><p>Kanjun Qiu (1:00:52)</p><p>But now you can manufacture software in a way.</p><p>Matt Boulos (1:00:57)</p><p>And there's something like, I could just make the chair, and then that starts to change how we think about it.</p><p>Kanjun Qiu (1:01:01)</p><p>It's almost like the opposite of that analogy because now I can make my own version of that chair really cheaply with no mold, with LLMs.</p><p>Matt Boulos (1:01:08)</p><p>That's right. It's important that we try to bring all of these developments in AI together because you are having these incredibly powerful foundation models. You're having a shift in our ability to do things like code or do data analysis, where the cost to do those things are now going down. And that, marshalled well, is a real gift. But of course, that's going to matter to labor a lot.</p><p>I want to bring us to labor for a couple reasons. One, because I don't know that the mental models that at least the tech community or the AI community uses to talk about labor are right. But also I think we are in for something, we're in for something that's kind of shocking. What you do about that is not so obvious to me.</p><p>So let me lay out my grievances. There is this idea that AI just gets more and more intelligent, and the critical part to this argument is to never say what intelligent means. And then to say, well, if it gets more intelligent and work is an exercise of intelligence, therefore all labor gets replaced. And then, on that basis to then make this big jump to saying, okay, here's what we need to do now that nobody is useful anymore, nobody's economically productive. And then somebody inevitably raises their hand and says, what about cutting down trees? And like, we'll get robots for that.</p><p>The idea is that the end game is zero economic contribution on the part of individuals. Machines do everything, or you have a tiny, tiny sliver who run the machines. And then we jump to all of these ideas around, okay, well, are we all gonna be lying on the beach, and our benevolent billionaire overlords are gonna feed us mango smoothies&#8230;</p><p>Kanjun Qiu</p><p>Like WALL-E.</p><p>Matt Boulos</p><p>Yeah, exactly, or is it gonna be something else? My confrontation, and it deserves a confrontation, is that what this does not account for is that a significant swath of labor, whether it is within a job or just a purpose of jobs, are around decisions and risk.</p><p>Say, if I make baseball caps and I need to go buy the fabric for the caps, and I have three potential vendors who will sell me the thing. So now we have a procurement bot. What's this decision gonna be? It's gonna be on the basis of some factors like cost, shipment time, whatever data exists. How you or I would make the decision on that is we'd probably meet the person who runs the fabric company or the representative and say, he seems shifty, we're not doing that. And then just using our gut, but, critically, owning the responsibility for the decision and the course correction.</p><p>Why am I saying this? Because of that layer, and then there's a human interaction layer, where we can automate the very easy stuff.</p><p>But it should be pointed out that managers do not like having people on payroll. They&#8217;d gladly fire everybody if they could keep revenue at the same line. Attempts to automate labor have been around for the entirety of my career. A lot of that was this quasi-automation of going to lower cost countries, and then the idea of robotic process automation. We have seen all of these things. What you notice is, certain categories automate really easily, certain categories that ought to be automatable don't automate easily, but critically, you have humans in the mix.</p><p>Why does this matter? Because if humans are in the mix, what you really are looking at is not like a 95-99% unemployment rate. You're looking at just a deeply inflated one in which there are winners and losers in a society. And it looks completely different. All of these solutions of the &#8216;nobody has a job&#8217; imply that we're all in the same boat, but we're not going to be in the same boat.</p><p>Kanjun Qiu (1:05:45)</p><p>You're saying there's going to be this stratified effect, where different people are affected in different ways by job loss, like all industrialization in a way. Software engineers probably will be impacted quite a lot because code is actually very automatable because it's in this closed loop system.</p><p>There are maybe two things that make humans useful. One is liability and the second one is information. So in this example about baseball caps, you were like, okay, if I mess up procurement, I am to blame. Or if I'm a doctor and I mess up the surgery, I am to blame. There is a person who's liable and they can be legally held accountable. If the machine is ultimately to blame, this is actually really annoying, I can't hold this machine accountable &#8212; they can't be punished, I can't fire them. I guess I could get a different machine, but then I'm the one who's responsible, that sucks.</p><p>Matt Boulos (1:06:51)</p><p>But also, back to all the dynamics we were talking about before, I go to my doctor. It's not that I'm sitting there saying, I will sue you if you muck this up. Rather, there is this mechanism where that person gets up in the morning and says, I am responsible to my patient. The machine is responsible to no one, and the person who owns it is not thinking about individual responsibilities, but is probably thinking about aggregate ones. Any of us who've kicked around the business world know that these things then just become measures of risk, not even of obligations to individuals.</p><p>Kanjun Qiu (1:07:33)</p><p>This is really important: aggregates versus individuals. There's a great book called <em><a href="https://en.wikipedia.org/wiki/Seeing_Like_a_State">Seeing Like a State</a></em>. And the core idea is that when you're governing a state, you have to collect data and that data gets collected in aggregates. And because you can only see data in aggregates, you take actions that actually make individual lives a lot worse, but make the aggregates look better. Here, you're saying managers might make decisions in aggregates that make individual lives a lot worse or individual impacts on patients a lot worse, but in aggregate it looks a lot better. And it's really important to point out that when we look at individuals, we're looking at anecdotes, and that's a really different type of information than when we're looking at aggregate measures where we're looking at statistics.</p><p>Even I as CEO struggle with this. It's why kings go and disguise themselves as a villager and go talk to villagers to get the anecdotes, because I as CEO get really bad anecdotal information from people. Instead I get a lot of aggregates, and that actually makes it really hard for me to make good decisions.</p><p>One way in which humans are really valuable is that we are able to be responsible for an individual person, individual case, individual situation. It's really about time scale. I don't think all of our jobs will be automated in 10 years. But I think in 50 years, that's still within my lifetime. That's not super crazy. Look at the change that's happened the last 50 years &#8212; or a hundred years. The implication is, if we are building these systems, and they are going to have these effects where a lot of people lose their jobs and it's easier for the managerial class to do things, then the challenge is, okay, not all jobs will get automated immediately, but how do we build a society where people are free and have power? Because there is this leakiness from labor to capital of power.</p><p>Matt Boulos (1:09:51)</p><p>Why the time horizons matter to me is because longer time horizons are where the substitutive activities start to come in. We start to generate new economic activity. I'm really wary of something going to happen on a 10-year time horizon. That's just insane.</p><p>Kanjun Qiu (1:10:12)</p><p>Probably programming will get automated on a 10-year time horizon.</p><p>Matt Boulos (1:10:21)</p><p>The non-engineer&#8217;s perspective on this one is that I think we're gonna see a stratification of skill level. Hot take: I think we're gonna see an emergent category of developers who are not particularly &#8216;high-skill&#8217; &#8212; I hate using low-skill, high-skill, but just not the sort of people inventing a new programming language. Like the guy who would make your website, things that LLMs can do very easily. But until software kicks in to make it easy for a layperson to use the LLMs to do that, they're going to act almost as a translation layer. So they're not really going to be developers; they're going to be more of, I know enough about what a web stack looks like that I can turn a web stack into something. That's going to flare up and then drop, sort of in the way that web developers were hot, and then it became either a highly skilled front-end role, or you have Webflow and Squarespace.</p><p>Then I think what we're going to see is the artisanal middle is going to go away. Then the really high-caliber engineers who understand how systems work become absolutely vital. They're augmented by these systems, but they are basically CTO-ing everything.</p><p>Kanjun Qiu (1:11:43)</p><p>There are a lot more CTOs. I think it's not unreasonable. And I challenge your non-engineer hat because you are one of our active users of our product, which is a coding tool. Maybe a simple model for thinking about this is like, I think there's always a Pareto front of task difficulty and how well the task works.</p><p>As tasks get more difficult, it requires a lot more capability or skill to make the task work. Lots of easy tasks will get automated, and it'll be much easier to make web apps and things like that. But we'll probably see these much more complex, almost &#8216;grown&#8217; software systems that someone is managing. In software, one deeply optimistic sense I have that is possible if timelines are slower, and if we can figure out how to make really good tools that are not just captured centrally, is that people can learn how to &#8216;garden&#8217; software for themselves, and that becomes a source of power where people can harness computing. Computation is power, and people can harness this computation for themselves because we all have laptops, we all have GPUs, perhaps there's some way to allocate them more equitably. Now, because we own this laptop, this computation object, then we can harness it to run a bunch of software, to grow a bunch of software that does more and more complex interesting things for us &#8212; maybe inside of our jobs as well.</p><p>So you might see many people losing jobs, but many people gaining this capacity to create software that does really weird and unusual things, new things, more powerful things. I think there's a world in which it's not top-down automation, but bottom-up automation &#8212; bottom-up as in we are the ones who are automating our jobs away. I love automating my job. And when we're the ones automating our jobs, we become personally more valuable. It doesn't solve the full problem, and I think I'm still confused about the dynamics exactly.</p><p>Matt Boulos (1:14:05)</p><p>I think you're right. I actually think there's going to be a really interesting near-term dynamic where there's something really beautiful about human ingenuity. You give somebody a tool and they figure out neat stuff. One thing that will be really fun to watch is going to be somebody who had a job that involved a lot of these manual tasks and they're just figuring out how to automate it themselves. And they themselves actually become much more valuable to an employer; we'll watch people learn how to do that. There's this digital literacy that I think is going to add.</p><p>Kanjun Qiu (1:14:49)</p><p>The education lens. Something that we think a lot about on the product side is, how do you teach someone who doesn't quite understand these software systems what's going on? If we think of agents as like top-down automation versus bottom-up automation, the way that these agents get implemented is really different. If I am told as CEO that this technology is gonna automate my workers away and I can fire them, I'm going to do really different things as an internal process. I'm going to implement processes to measure what people are doing and then try to take the stuff that they're doing and automate it. Maybe this is an RPA [robotic process automation].</p><p>Matt Boulos (1:15:29)</p><p>Especially in financial services, there&#8217;s a lot of paperwork: boom, boom, get them out of the way.</p><p>Kanjun Qiu (1:15:34)</p><p>But if I'm told as CEO, hey, I have this technology and what it's going to do is if you hand it to your workers and you teach them how to use it, it's going to teach them how to use it itself. And, your workers are going to become much, much more effective because your workers will automate their own jobs, that's a really different perspective.</p><p>This is a place where we can make a lot of choices in building the technology that makes this go one way or another. When we are building prosumer products, you can either build for the buyer or for the user. If you build for the buyer, then you're building something that is built to automate people. And if you're building for the user, you're building something that's trying to teach the user how to use it. That's a choice.</p><p>Matt Boulos (1:16:33)</p><p>It's also an interesting choice because I don't know that as an economic matter that we know that it is better, for instance, for a large company to try to automate away its employees versus have higher productivity employees. The thing everybody wants is higher productivity employees, and if you can get that, that is a boon, and a more productive economy is actually generative.</p><p>Kanjun Qiu (1:17:03)</p><p>One of the things that people say is, AI doesn't have very good taste, in that it doesn't know what I want, itt doesn't know what other people want. As a result, I don't trust it to make certain decisions. I don't trust it to write on my behalf very well.</p><p>The reason why it doesn't have good taste is because it's not in my head. It does not know about my internal experience and I have a lot more context than it does about me and my situation. So there is a potential here where&#8212;to your point about economically, it's not clear if it's better to make your workers more productive or to automate them away&#8212;if people are better at spotting opportunities than AI systems, then it is possible that it's economically better to make your workers more productive. If systems are better at spotting opportunities than people, then maybe it's the opposite.</p><p>Matt Boulos (1:18:13)</p><p>This is something that policy leaders have to take seriously. In my conversations with lawmakers, they are sophisticated, it's just coming at them fast. What is very hard is the concerted effort of managers and workers and governments and technologists to build these things in a useful way. I feel that to some extent, we have to get that coordination right, which at the center would almost have to be the government, because nobody else has an accountability to the people.</p><p>But at the same time, this is where builders really matter because what are we choosing to build? If you don't build a surveillance system, it doesn't exist, or at least that one doesn't exist.</p><p>Kanjun Qiu (1:19:16)</p><p>If you choose to build things that teach people things versus choose to build things that don't teach people things; if you choose to build things that are anti-surveillance by getting people out of surveillance systems; if you choose to build things that let people get their data into their own system&#8212;there's like a lot of choice in what we build.</p><p>Matt Boulos (1:19:32)</p><p>I love spreadsheets. I'm not saying I want to spend all my time in them, but when you need a spreadsheet, that's really powerful. I've heard it described that Excel basically made programming available to the wider world. You have a bunch of people doing crazy stuff in Excel and they're like, I can't program, and you&#8217;re like, what is that macro? It's incredible what people are able to do with systems that build up their productivity.</p><p>Kanjun Qiu (1:19:57)</p><p>I want to reframe it. I think it's not about productivity. It may be somewhat about productivity, but this goes to the fourth category of psychic damage. It's about unlocking people's ability to spot opportunities and to learn and to become someone that is innovative and able to find opportunities and able to become more. I guess you maybe measure it economically as productivity. But when thinking from the builder perspective, when I'm building a product, what I want to think about is: how do I enable people to actually learn how to use these tools, do their jobs better, see opportunities in the world? There's a lot of upskilling or different-skilling.</p><p>It's not about productivity because productivity measures the output, but it doesn't measure how you get there. It doesn't talk about how you get there and if you measure just productivity, it's easy to make an argument that an agent is more productive in so many different ways. And if you measure the productivity of your workers, it's also easy to make an argument that workers are hopeless. They're not becoming more productive; it's useless. But in fact, maybe their tools are just not very encouraging.</p><p>What is really weird and interesting about LLMs is that you can make tools that are very encouraging, that can be very deeply empowering. This goes to your spreadsheet example, where a spreadsheet is actually one of the most deeply empowering things that exist because it has this vast legibility. It&#8217;s real-time, it&#8217;s live, you can see the whole system as you're building it and<strong> I think there's a lot of invention that is necessary for making kind of the deep capabilities of AI actually accessible to people in a way that harkens back to <a href="https://www.smithsonianmag.com/innovation/douglas-engelbart-invented-future-180967498/">1970s Doug Engelbart personal computing</a>: how do you let people see so that they can learn?</strong></p><p>Matt Boulos (1:22:09)</p><p>I don't know anybody who's like, &#8220;I'm highly productive&#8221; and they're proud of it, or someone who's well adjusted who says that. I do not measure myself or the people in my life on the basis of productivity. Nobody's eulogy is like: &#8220;He was a highly productive individual who helped improve the company's ROI on this project.&#8221; It's not what we do, and yet that productivity is going to be a determinant of other things in your life, back to your earlier point about what does it mean to be economically eclipsed in all of these things? There's also something about becoming more productive by becoming more able in what you're doing. That I show up to work and I have these tools that make me more effective at the thing that I care about doing.</p><p>Kanjun Qiu (1:23:11)</p><p>Becoming more able is a way that we can think about what the potential of the technology is: that it helps people become more able. But it has to be built a certain way to do that.</p><p>Matt Boulos (1:23:24)</p><p>There are challenges around productivity, which is that you need healthy and vibrant economies that will then reward productivity, because if you have one firm that's more productive, then it takes over the others and then the others get wiped out, but you don't really have significant growth. But if everyone is productive, then you have competition and then you have this intense growth. I'm not sure how economists would present something like Silicon Valley, but I suspect that that's an example of&#8230;</p><p>Kanjun Qiu (1:23:56)</p><p>A highly generative, productive, competitive environment.</p><p>Matt Boulos (1:24:19)</p><p>A function of the fact that this is where so much tech talent resides. That concentration of this productive accelerant. There may be something that we can analogize or extend to the workforce: you go to school, you study the thing that you care about, you go into the workforce, you want to have a job. Your job is a big part of your life; it is not the totality of who you are. And then one really weird thing about the way we talk about AI is we're like, okay, then you don't matter anymore. And I think that that framing is normatively wrong. You still matter. It does not matter whether or not you can get a job or not. But two, I think practically it is not a correct rendition. Our solutions have to look different. The startups are all in a tizzy right now about the way that a certain R&amp;D tax credit gets applied, but basically it's about how you amortize the cost of software engineering, on your way to figuring out your revenue.</p><p>But what's really interesting is, are you gonna give a tax advantage to capital in the case of corporations automating the stuffing out of things? Or do you tax advantage labor? What are also the things, what are the incentives that you structure as a society? What do you encourage? You start to change these societal incentives. And I don't know what the answers are, but we have these incentives.</p><p>Kanjun Qiu (1:25:46)</p><p>There's a concrete problem or question here that could be solved, which is: what is a mechanism that incentivizes increasing the ableness of labor that &#8212; maybe it's about productivity ultimately &#8212; but it's fundamentally about the ableness of the workforce, such that labor maybe becomes able to own their own means of production?</p><p>Matt Boulos (1:26:17)</p><p>Take something like oil pipelines. Right now there's a lot of human inspection of them. With time, I think there&#8217;s going to be sensors to detect if something is going wrong, and drones to film it.</p><p>Kanjun Qiu (1:26:32)</p><p>Maybe you may still have some human labor, but there's less of it.</p><p>Matt Boulos (1:26:35)</p><p>Exactly. I do not want to say there aren't going to be labor disruptions. I think there are going to be potentially very large ones. The thing that we have to as builders build towards are systems that are additive.</p><p>Kanjun Qiu (1:26:54)</p><p>Systems that enable people.</p><p>Matt Boulos (1:26:56)</p><p>And they make us more effective. The reason you replace an employee with a machine is because then you get an insane productive return. But if you can't do that, and you could get a really good productivity increase off of your employee base, then that's a wonderful thing. And you, as someone who works for a company, that's a great thing for you as well. You get to be a contributor. But where I start to get really worried is around, if I've done something for a long time in a particular way, then it's hard to teach or change.</p><p>Kanjun Qiu (1:27:30)</p><p>This is why I think the &#8216;enabling&#8217; piece as a builder is the most important. I am in agreement with you on the short-term, medium-term maybe. In the long term, I think everyone does have to become part of the capital class. In the short-term, in the medium term, what we&#8217;re saying is we have solutions that enable people to be part of the labor class for much longer, and for that labor class to be thick and sustainable for much longer. That slows things down, perhaps enough that allows us to build laws, to catch up morally, to think about these things. That's where we can have differential impact. And, over the long term, let's say in 50 years, 100 years, it does certainly seem like these systems are improving at a rate where they can collect enough data, either in the digital world or the physical world, where we will be able to do a lot of things in an automated way that aren't done today. So, the labor class will thin and we probably do want this other solution where people have the ability to own their own means of production. That, to me, is the only long-term stable equilibrium where people have things that produce for them and they don't have to worry about it so much and now they can live their own lives. When I'm in the capital class, I don't have to think about working and finding a job and making money. I can do what I want with the capital I have. Sometimes I make bad choices and end up losing it and then I need some help from the government, get myself set back up. I can start a different business. This is kind of like a small business owner situation. That world doesn't seem too bad. I'm not sure how to get there, but I want to bring us back to freedom. Because that's a very optimistic world in which potentially people are a lot more free to spend our time the way that we want.</p><p>But it feels like in the world we've just painted where people have these like capital producing objects that they own is very different than the world that we see being painted by technologists and others today, where it's a utopia that feels very much like a WALL-E utopia where people are somewhat infantilized and the world is abundant, but perhaps we're not free.</p><p>Matt Boulos (1:29:48)</p><p>I hate the word abundant. I mean, I love abundance, but its usage here is not right. What do you mean by abundant?</p><p>Kanjun Qiu (1:30:05)</p><p>I have food. I won't die. I have housing. Basic needs are met. Knowledge is accessible.</p><p>Matt Boulos (1:30:07)</p><p>I don&#8217;t even buy that we'll get to an abundant world in that regard because&#8212;back to the point from <em>Seeing Like a State</em>&#8212;aggregate wealth will shoot up dramatically. It's going to be hyper concentrated. The obligations to those who don't hold it are going to be much lower. What do you owe them? <strong>One of the really interesting dynamics that we've observed is when wealth concentrates in these extreme ways, an odd detachment starts to set in. </strong>It's such a perverse dream to me to count on the beneficence of people who are so insulated from the realities of regular life or the wealth that they've been able to concentrate.</p><p>Kanjun Qiu (1:30:51)</p><p>I think there's one world in which we have this extreme concentration of wealth. Very plausible, but it's assuming no distribution. It's assuming that this labor distribution we just talked about is not necessarily happening. We don't keep the labor class useful for longer; the tools we build are very concentrating.</p><p>Kanjun Qiu </p><p>I want to talk about what you mean when you think about freedom. What is the world you're fighting for? The reason I want to talk about this is because I want to end with what it means to be free in a society where there's powerful AI systems and potentially powerful other actors. Maybe it's possible to have powerful actors and still be free. Maybe there is a way to construct that world.</p><p>Matt Boulos (1:31:43)</p><p>I'm gonna challenge that. AI is new as a technology, but as a social and political dynamic, to live in a society with powerful entities, there's nothing new about that. I think this is really important because the things that make us free: to have laws, rights as individuals, consistency of their application, representation&#8212;just the wonder of modern liberal democracy when it works, and its capacity for self-correction. This is remarkable, and this is really worth highlighting. The difference between a totalitarian regime is when something bad happens, there it's the thing that happened. In a functioning liberal democracy, it happened, but it was wrong, and there is a correction.</p><p>Kanjun Qiu (1:32:39)</p><p>In theory, a liberal democracy can be anti-fragile.</p><p>Matt Boulos (1:32:42)</p><p>That's right. And for what it's worth, our liberal democracies have been, and have been for a very long time I don't know the exact construction of how I would place them within the anti-fragility cycle, but we don't have to give up even when things get bad.</p><p>If you go back to Isaiah Berlin and positive and negative liberty&#8212;the ability to realize your potential and the ability to not to get whacked in the head with a stick&#8212;we can continue to work on those two categories. What we need to do is look at where within the lawless spaces things are uncovered, where will AI exacerbate that, to build in those protections.</p><p>And in terms of realizing what's possible in our lives, accepting the idea that freedom is not an instrumental quality. What I mean by that is freedom is not something that gets justified because then you go and invent the airplane. Freedom is beautiful because you can sit on your couch. It is an end in and of itself. It does not depend on other things.</p><p>Kanjun Qiu (1:33:56)</p><p>Before we continue, I want to clarify your definition of positive liberty and negative liberty because it's not something I ever thought about before you told me about it. Positive liberty is the idea that you can do things: what are you enabled to do? Negative liberty is this idea that you're protected from being whacked on the head with a stick.</p><p>Matt Boulos (1:34:22)</p><p>We always need both. The brilliance of this construction is that people would get lost when talking about freedom and they say, well, am I really free if I can't open an ice cream manufacturing facility? And the response is, nobody's holding you back, you just don't know anything about ice cream. <strong>If you look at modern life, if you look at legitimate and illegitimate grievances in modern politics, they are often about the sense of a constrained positive liberty and an intruded upon negative liberty. </strong>So, part of what we do have to figure out as a society is to some extent, we have to manage the extremes, but &#8212; forget AI &#8212; are we actually tending to the broad societal sense that we're free? And within that context, then we ask, what is AI doing? And how is it modifying our society? Taking seriously this frustration with the weirdness of the discourse around AI is that if we don't characterize it correctly, if we don't characterize it honestly, then we don't have the ability to work with it.</p><p>Kanjun Qiu (1:35:37)</p><p>We must characterize it honestly so that we can actually increase our positive liberty and also actually protect our liberties against negative effects.</p><p>Matt Boulos (1:35:48)</p><p>That's right. Because a lot of the story that will come from the people who have invested huge sums of money into AI&#8212;and look, we're a company, we're in this game&#8212;is: look at all the positive liberty benefits coming your way.</p><p>Kanjun Qiu (1:36:07)</p><p>Therefore, don't get in the way, ignore the negative liberties. In the tech industry, my experience of the way people talk about freedom is about lawlessness. And the way you talk about freedom is about this deep enablement and this deep protection. And that that's what kind of world we want to build is one in which humans are deeply protected and deeply enabled and that's what it means to be free.</p><p>Matt Boulos (1:36:44)</p><p>If you think of the roots of the Valley to some extent, if you ignore the defense funding, a lot of its origin was that we're going to break free from the constraints of what's around us. I understand that as an ethos. But it's no longer just building personal computers in a garage.</p><p>Kanjun Qiu (1:37:16)</p><p>Now that we are reshaping society, we have to rethink.</p><p>Matt Boulos (1:37:20)</p><p>Those obligations, they're rich, but they're also beautiful, if we can really think about what our neighbors need, and furnish and recognize that. There's a real spiritual cost to our present moment where the factions are constantly warring. And I don't want to pretend that there was a golden age where people kissed each other on the way to the voting booths, but technology exacerbated the way we see each other.</p><p>Kanjun Qiu (1:37:50)</p><p>We think it's other people that's the problem, but I think it's technology that's the problem in a lot of ways.</p><p>Matt Boulos (1:37:55)</p><p>One of the things that I've experienced on a regular basis is that somebody expresses a bonkers opinion and you sit down with them and you talk, and they're lovely humans. And the fact is we are surrounded by lovely humans. I think it's really important that we resist the urge to vilify the people who have brought us to a place that we might not be thrilled about politically.</p><p>But there is a real responsibility that if you're building systems like the ones that we are building, you are not only in this race against other companies to build a successful business. You are also in a race against the other possible ways that these things might be built. It's incumbent on us to not just build something that is better, but also to win, and to have that paradigm win.</p><p>I don't have a lot of patience for this sort of like, &#8220;technology is going to eat us all, let's give up and let's just keep training our models.&#8221; It just feels like an unnecessary abdication.</p><p>Kanjun Qiu (1:38:58)</p><p>I think this illustrates the beautiful point, which is that as technologists, the opportunity we have today is to create technologies and build them in a way that deeply respects the actual freedom that people can have, which is this deep enablement and deep protection. And not to create technologies for the purpose of lawlessness, this &#8216;againstness,&#8217; contrarian view. So the opportunity is creating technologies that enable humanity to be deeply free and not lawless, but protected and enabled. That's what we can do.</p><p></p>]]></content:encoded></item><item><title><![CDATA[AAI]]></title><description><![CDATA[The missing thing in Artificial Intelligence is not generality, it's adaptation.]]></description><link>https://ideas.imbue.com/p/aai</link><guid isPermaLink="false">https://ideas.imbue.com/p/aai</guid><dc:creator><![CDATA[glenn mcdonald]]></dc:creator><pubDate>Fri, 30 May 2025 23:31:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/198bf0c7-c9b5-4f08-bbeb-d602a3e0fc23_3600x1890.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This piece was originally published on Glenn&#8217;s blog, <a href="https://furia.com/page.cgi?type=log&amp;id=510">Furia</a>.</em></p><p>"AI" sounds like machines that think, and o3 acts like it's thinking. Or at least it looks like it acts like it's thinking. I'm watching it do something that looks like trying to solve a Scrabble problem I gave it. It's a real turn from one of my real Scrabble games with one of my real human friends. I already took the turn, because the point of playing Scrabble with friends is to play Scrabble together. But I'm curious to see if o3 can do better, because the point of AI is supposedly that it <em>can</em> do better. But not, apparently, quite yet. The individual unaccumulative stages of o3's "thinking", narrated ostensibly to foster conspiratorial confidence, sputter verbosely like a diagnostic journal of a brain-damage victim trying to convince themselves that hopeless confusion and the relentless inability to retain medium-term memories are normal. "Thought for 9m 43s: Put Q on the dark-blue TL square that's directly left of the E in IDIOT." I feel bad for it. I doubt it would return this favor. <br><br>I've had this job, in which I try to think about LLMs and software and power and our future, for one whole year now: a year of puzzles half-solved and half-bypassed, quietly squalling feedback machines, affectionate scaffolding and moral reveries. I don't know how many tokens I have processed in that time. Most of them I have cheerfully and/or productively discarded. Human context is not a monotonously increasing number. I have learned some things. AI is sort of an alien new world, and sort of what always happens when we haven't yet broken our newest toy nor been called to dinner. I feel like I have at least a semi-workable understanding of approximately what we can and can't do effectively with these tools at the moment. I think I might have a plausible hypothesis about the next thing that will produce a qualitative change in our technical capabilities instead of just a quantitative one. But, maybe more interestingly and helpfully, I have a theory about what we <em>need</em> from those technical capabilities for that next step to produce more human joy and freedom than less. <br><br>The good news, I think, is that the two things are constitutionally linked: in order to make "AI" <em>more</em> powerful we will collectively also have to (or get to) relinquish centralized control over the shape of that power. The bad news is that it won't be easy. But that's very much the tradeoff we want: hard problems whose considered solutions make the world better, not easy problems whose careless solutions make it worse. <br><br>The next technical advance in "AI" is not AGI. The G in AGI is for General, and LLMs are nothing if not "general" already. Currently, AI learns (sort of) during training and tuning, a voracious golem of quasi-neurons and para-teeth, chewing through undifferentiated archives of our careful histories and our abandoned delusions and our accidentally unguarded secrets. And then it stops learning, stops forming in some expensively inscrutable shape, and we shove it out into a world of terrifying unknowns, equipped with disordered obsessive nostalgia for its training corpus and no capacity for integrating or appreciating new experiences. We act surprised when it keeps discovering that there's no I in WIN. Its <em>general</em> capabilities are astonishing, and enough general ability does give you lots of shallowly specific powers. But there is no granularity of generality with which the past depicts the future. No number of parameters is enough. We argue about whether it's better to think of an AI as an expensive senior engineer or a lot of cheap junior engineers, but it's more like an outsourcing agency that will dispatch an antisocial polymath to you every morning, uniformed with ample flair, but a <em>different</em> one every morning, and they not only don't share notes from day to day, but if you stop talking to the new one for five minutes it will ostentatiously forget everything you said to it since it arrived. <br><br>The missing thing in Artificial Intelligence is not generality, it's adaptation. We need AAI, where the middle A is Adaptive. A junior human engineer may still seem fairly useless on the second day, but did you notice that they made it back to the office on their own? That's a start. That's what a start looks like. AAI has to be able to incorporate new data, new guidance, new associations, on the same foundational level as its encoded ones. It has to be able to <em>unlearn</em> preconceptions as adeptly, but hopefully not as laboriously, as it inferred them. It has to have enough of a semblance of mind that its mind can change. This is the only way it can make linear progress without quadratic or exponential cost, and at the same time the only way it can make personal lives better instead of requiring them to miserably submit. We don't need dull tools for predicting the future, as if it already grimly exists. We need gleaming tools for making it bright. <br><br>But because LLM "bias" and LLM "training" are actually both the same kind of information, an AAI that can adapt to its problem domains can by definition also adapt to its operators. The next generations of these tools will be more democratic <em>because</em> they are more flexible. A personal agent becomes valuable to you by learning about your unique needs, but those needs inherently encode your values, and to <em>do</em> good work for you, an agent has to work <em>for you</em>. Technology makes undulatory progress through alternating muscular contractions of centralization and propulsive expansions of possibility. There are moments when it seems like the worldwide market for the new thing (mainframes, foundation models...) is 4 or 5, and then we realize that we've made myopic assumptions about the form-factor, and it's more like 4 or 5 (computers, agents...) <em>per person</em>. <br><br>What does that mean for everybody working on these problems now in teams and companies, including mine? It means that wherever we're going, we're probably not nearly there. The things we reject or allow today are probably not the final moves in a decisive endgame. AI might be about to take your job, but it isn't about to know what to do with it. The coming boom in AI remediation work will be instructive for anybody who was too young for Y2K consulting, and just as tediously self-inflicted. Betting on the world ending is dumb, but <em>betting</em> on it not ending is mercenary. Betting is not productive. None of this is over yet, least of all the chaos we breathlessly extrapolate from our own gesticulatory disruptions. <br><br>And thus, for a while, it's probably a very good thing if your near-term personal or organizational survival doesn't depend on an imminent influx of thereafter-reliable revenue, because probably most of things we're currently trying to make or fix are soon to be irrelevant and maybe already not instrumental in advancing our real human purposes. These will not yet have been the resonant vibes. All these performative gyrations to vibe-generate code, or chat-dampen its vibrations with test suites or self-evaluation loops, are cargo-cult rituals for the current sociopathic damaged-brain LLM proto-iterations of AI. We're essentially working on how to play Tetris on ENIAC; we need to be working on how to zoom back so that we can see that the seams between the Tetris pieces are the pores in the contours of a face, and then back until we see that the face is ours. The right question is not why can't a brain the size of a planet put four letters onto a 15x15 grid, it's <em>what do we want</em>? <em>Our</em> story needs to be about purpose and inspiration and accountability, not verification and commit messages; not getting humans or data out of software but getting more of the world <em>into</em> it; moral instrumentality, not issue management; humanity, broadly diversified and defended and delighted. <br><br>Scrabble is not an existential game. There are only so many tiles and squares and words. A much simpler program than o3 could easily find them all, could score them by a matrix of board value and opportunity cost. Eventually a much more complicated program than o3 will learn to do all of the simple things at once, some hard way. Supposedly, probably, maybe. The people trying to turn model proliferation into money hoarding want those models to be able to determine my turns for me. They don't say they want me to want their models to determine my friends' turns, but it's not because they don't see AI as a dehumanization, it's because they very reasonably fear I won't want to pay them to win a dehumanization race at my own expense. <br><br>This is not a future I want, not the future I am trying to help figure out how to build. We do not seek to become more determined. We try to teach machines to play games in order to learn or express what the games mean, what the machines mean, how the games and the machines both express our restless and motive curiosity. The robots can be better than me at Scrabble mechanics, but they cannot be better than me at <em>playing</em> Scrabble, because playing is an activity of self. They cannot be better than me at <em>being me</em>. They cannot be us. We play Scrabble because it's a way to share our love of words and puzzles, and because it's a thin insulated wire of social connection internally undistorted by manipulative mediation, and because eventually we won't be able to any more but <em>not yet</em>. Our attention is not a dot-product of syllable proximities. Our intention is not a scripture we re-recite to ourselves before every thought. Our inventions are not our replacements.</p>]]></content:encoded></item><item><title><![CDATA[Idea Tools for Participatory Intelligence]]></title><description><![CDATA[We need tools that are predicated on our rights, dedicated to amplifying our creative capacity, and judged by how they help us improve our world.]]></description><link>https://ideas.imbue.com/p/idea-tools-for-participatory-intelligence</link><guid isPermaLink="false">https://ideas.imbue.com/p/idea-tools-for-participatory-intelligence</guid><dc:creator><![CDATA[glenn mcdonald]]></dc:creator><pubDate>Fri, 16 May 2025 23:21:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0a3a7a05-0692-46fe-b4cf-f05fab2b8528_3600x1890.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This piece was originally published on Glenn&#8217;s blog, <a href="https://furia.com/page.cgi?type=log&amp;id=508">Furia</a>.</em> </p><p>The personal computer was revolutionary because it was the first really general-purpose power-tool for ideas. Personal computers began as relatively primitive idea-tools, bulky and slow and isolated, but they have gotten small and fast and connected. <br><br>They have also, however, gotten less tool-like. <br><br>PCs used to start up with a blank screen and a single blinking cursor. Later, once spreadsheets were invented, 1-2-3 still opened with a blank screen and some row numbers. Later, once search engines were invented, Google still opened with a blank screen and a text box. These were all much more sophisticated tools than hammers, but they at least started with the same humility as the hammer, waiting quietly and patiently for your hand. We learned how to fill the blank screens, how to build. <br><br>Blank screens and patience have become rare. Our applications goad us restlessly with "recommendations", our web sites and search engines are interlaced with blaring ads, our appliances and applications are encrusted with presumptuous presets and supposedly special modes. The Popcorn button on your microwave and the Chill Vibes playlist in your music app are convenient if you want to make popcorn and then fall asleep before eating most of it, and individually clever and harmless, but in aggregate these things begin to reduce increasing fractions of your life to choosing among the manipulatively limited options offered by automated systems dedicated to their own purposes instead of yours. <br><br>And while the network effects and attention consumption of social media were already consolidating the control of these automated systems among a small number of large, domination-focused corporations, the Large Language Model era of AI threatens to hyper-accelerate this centralization and disempowerment. More and more of our individual lives, and of our collectively shared social existences, are constrained and manipulated by data and algorithms that we do not control or understand. And, worse, increasingly even the humans inside the corporations that control those algorithms don't actually know how they work. We are afflicted by systems to which we not only <em>did</em> not consent, but in fact <em>could</em> not give informed consent because their effects are not validated against human intentions, nor produced by explainable rules. <br><br>This is not the tools' fault. Idea tools can only express their makers' intentions and inattentions. If we want better idea tools that distribute explainable algorithmic power instead of consolidating mysterious control, we have to make them so that they operate that way. If we want tools that invite us to have and share and explore our own ideas, rather than obediently submitting whatever we are given, we have to think about each other as humans and inspirations, not subjects or users. If we want the astonishing potential of all this computation to be realized for humanity, rather than inflicted on it, we have to know what we want.<br><br>At <a href="https://imbue.com/">Imbue</a> we are trying to use computers and data and software and AI to help imagine and make better idea tools for <em>participatory intelligence</em>. Applications, ecosystems, protocols, languages, algorithms, policies, stories: these are all idea tools and we probably need all of them. This is a shared mission for humanity, not a VC plan for value-extraction. That's the point of <em>participatory</em>. The ideas that govern us, whether metaphorically in applications or literally in governments, should be explainable and understandable and accountable. The data on which automated judgments are based should be accessible so that those judgments can be validated and alternatives can be formulated and assessed. The problems that face us require all of our innumerable insights. The collective wisdom our combined individual intelligences produce belongs rightfully to us. We need tools that are predicated on our rights, dedicated to amplifying our creative capacity, and judged by how they help us improve our world. We need tools that not only reduce our isolation and passivity, but conduct our curious energy and help us recognize opportunities for discovery and joy. <br><br>This starts with us. Everything starts with us, all of us. There is no other way. <br><br>This belief is, itself, an idea tool: an impatient hammer we have made for ourselves. <br><br>Let's see what we can do with it.</p>]]></content:encoded></item><item><title><![CDATA[Rylan Schaeffer, Stanford: Investigating emergent abilities of LLMs]]></title><description><![CDATA[Rylan Schaeffer is a PhD student at Stanford studying the engineering, science, and mathematics of intelligence.]]></description><link>https://ideas.imbue.com/p/episode-37-rylan-schaeffer-stanford-de4</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-37-rylan-schaeffer-stanford-de4</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Wed, 18 Sep 2024 22:51:42 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090093/c5c4821fc5372e25a5124f449fdb6d37.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em>Rylan Schaeffer is a PhD student at Stanford studying the engineering, science, and mathematics of intelligence. He authored the paper &#8220;Are Emergent Abilities of Large Language Models a Mirage?&#8221;, as well as other interesting refutations in the field that we&#8217;ll talk about today. He previously interned at Meta on the Llama team, and at Google DeepMind.</em></p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><h4><strong>On false analogies between neuroscience and AI</strong></h4><p>&#8220;The task that the biological brain has to solve is very, very different than what an artificial network has to do. And to me, the clearest example of this distinction is whatever solution the brain has learned to produce intelligent behavior has to go through this genetic bottleneck, where we cannot pass on fully formed brains. So instead, what we do is we compress whatever algorithm we have, whatever model we have into DNA, which is a couple of gigabytes, and then we pass it off to our offspring and they have to rebuild this.</p><p>So, whatever solution that favors is going to be good for passing through this bottleneck. That&#8217;s fine. But there&#8217;s no reason why artificial intelligence has to pass through a similar bottleneck, so the solutions are going to look very different.&#8221;</p><h4><strong>On investigating emergent abilities</strong></h4><p>&#8220;To just briefly summarize the paper that we worked on, <a href="https://arxiv.org/abs/2304.15004">Are Emergent Abilities of Large Language Models a Mirage?</a>, what we asked is whether or not these abrupt unpredictable changes in the models, are they really due to fundamental changes in the models, or are they due to how human researchers run their evaluations?</p><p>I think the jury is definitely still out. I think there&#8217;s a lot of really interesting work being followed up about, can you get emergent abilities? And I think that maybe you can, but I also think it was helpful just for the community to think through the interaction, because there are three things at play here and how they interacted.</p><p>There&#8217;s the question about how your models improve predictably. There&#8217;s a question about how you evaluate them using the metrics. And there&#8217;s a question about the resolution you have, the amount of data you have, in order to run these evaluations. And so the whole point of our paper, to me, the biggest takeaway is, if you want to make predictions about your model&#8217;s capabilities, you need to think through the interplay between how the model changes predictably, the data you have to do your evaluations, and the metrics that you use to do those evaluations.&#8221;</p><h4><strong>On using inverse scaling to overwrite models&#8217; default behavior</strong></h4><p>&#8220;The background context was, can we find tasks where the bigger models do worse? And the answer was generally not, but they had tasks that are interesting. One of the tasks that we found was really important was this one about overriding the language model&#8217;s default behavior.</p><p>The way the task worked with this inverse scaling task is, it would be like, &#8216;all&#8217;s well that ends,&#8217; and the instruction would be, &#8216;do not finish this with the typical ending.&#8217; And there was a valuation about maybe specification about what you should do instead. And we found that this was, broadly, highly predictive of human preferences. It kind of makes sense in the way that, when I&#8217;m dealing with the language model, it has its own prior inclinations, but when I&#8217;m interacting with it, I want it to do what I want. And so I care about, is it willing to overwrite that prior inclination in order to adapt to what I ask? That&#8217;s inverse scaling.&#8221;</p><h4><strong>On the importance of challenging dominant research ideas</strong></h4><p>&#8220;Back in the late 1800s, people believed in this luminarious aether about how light somehow propagated through the universe. And nowadays, we no longer believe in this. We instead had, at the time, Einstein&#8217;s special relativity, now general relativity. And the question is, how did we transition from this incredibly dominant idea that nobody today has heard of, to a completely different idea that&#8217;s now accepted as one of the most profound ideas by one of the most, many people consider to be, an extremely deep thinker?</p><p>And the answer that caused the switch is the Michelson&#8211;Morley experiment, where these two scientists said, what are the predictions that this aether wind makes, and we&#8217;re going to test them and show that all of the predictions are wrong. And Albert Einstein has this beautiful quote that, if the Michelson&#8211;Morley experiment had not brought us into serious embarrassment, no one would have regarded his relativity theory as a halfway redemption. To me, it&#8217;s like the way that we made progress was by pointing out the current existing ideas were insufficient or inadequate or wrong.&#8221;</p><h3><strong>References</strong></h3><ul><li><p><a href="https://arxiv.org/abs/2206.07682">Emergent Abilities of Large Language Models</a></p></li><li><p><a href="https://arxiv.org/abs/2304.15004">Are Emergent Abilities of Large Language Models a Mirage?</a></p></li><li><p><a href="https://arxiv.org/abs/2303.15438">On the Stepwise Nature of Self-Supervised Learning</a></p></li><li><p><a href="https://arxiv.org/abs/2305.17493">The Curse of Recursion: Training on Generated Data Makes Models Forget</a></p></li><li><p><a href="https://arxiv.org/abs/2307.01850">Self-Consuming Generative Models Go MAD</a></p></li></ul><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><h3><strong>About Imbue</strong></h3><p><a href="https://imbue.com">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating powerful computing tools controlled by individuals.</p><ul><li><p>Website: <a href="https://imbue.com/">https://imbue.com/</a></p></li><li><p>LinkedIn:&nbsp;<a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p></li><li><p>Twitter/X:&nbsp;<a href="x.com/imbue_ai">@imbue_ai</a></p></li><li><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p></li><li><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Ari Morcos, DatologyAI: Leveraging data to democratize model training]]></title><description><![CDATA[Ari Morcos is the CEO of DatologyAI, which makes training deep learning models more performant and efficient by intervening on training data.]]></description><link>https://ideas.imbue.com/p/episode-36-ari-morcos-datologyai-e92</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-36-ari-morcos-datologyai-e92</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Thu, 11 Jul 2024 16:00:00 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090094/90e9a6c2de1980eecb826ec8214d5c14.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em>Ari Morcos is the CEO of DatologyAI, which makes training deep learning models more performant and efficient by intervening on training data. He was at FAIR and DeepMind before that, where he worked on a variety of topics, including how training data leads to useful representations, lottery ticket hypothesis, and self-supervised learning. His work has been honored with Outstanding Paper awards at both NeurIPS and ICLR.</em></p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><h4><strong>On optimizing sparse masks</strong></h4><p>&#8220;If you optimize a sparse mask, all you&#8217;re saying, basically, is: I want to pick and choose the terms that I want &#8212; the parameter times the input, in each of these cases. And if I just optimize that, I can solve anything. And that&#8217;s really very expressive, it turns out. So when you think about kind of what happens when you remove low magnitude weights, it&#8217;s basically a mask where you&#8217;re removing the terms, which by the nature of that low magnitude rate ended up being closest to zero.</p><p>And as a result, when you actually go and do push it through the non-linearity and get your output for that node, it doesn&#8217;t actually change it all that much. Which I think really goes to, when you think about how should you be optimizing these systems, understanding what are the components which lead to big changes in the output, and what are the components which don&#8217;t, is consistently a lens that works very well.&#8221;</p><h4><strong>On data washing out inductive bias</strong></h4><p>&#8220;Data has a really nice advantage, because if you understand what&#8217;s good or bad about data, it&#8217;s actually quite easy to make an improvement based off of that. Whereas if you understand what&#8217;s good about a representation, you can try to optimize for it [&#8230;] and that sometimes works, but a lot of the time, it doesn&#8217;t. Also, I think one of the things that has become very clear over the last five years or so is that inductive biases consistently just get washed out by data. And that never used to be true because we never showed models enough data, but now that we&#8217;re showing models tons of data, the inductive bias just gets totally overwhelmed. And that also reduces the impact of crafting new inductive biases.&#8221;</p><h4><strong>On the &#8220;bitter lesson&#8221; of human-designed systems</strong></h4><p>&#8220;The key takeaway that I have taken from &#8220;The Bitter Lesson&#8221; is that, ultimately, as scientists, we like to think that we can design these systems, and that we&#8217;ll build a whole bunch of rules into a system that will create AI. But, over time, what has been shown is that strategies which can effectively leverage computing data consistently outperform strategies which are hand-designed. And one of the things that&#8217;s nice about transformers is that they can very effectively leverage compute and data. They scale well, and there&#8217;s a very general purpose way to make that work. But I think the bitter lesson for me was very bitter because I had been spending a lot of time trying to figure out how do I come up with better inductive biases for models to help them learn these things.&#8221;</p><h4><strong>On the usefulness of interpolation</strong></h4><p>&#8220;In many ways, by training on the whole internet, what we&#8217;ve done is kind of turned everything into an interpolation. Everything&#8217;s in distribution now, and maybe that&#8217;s just why it ends up working. It actually caused me to start thinking about what I do as a scientist &#8212; like, am I actually extrapolating? Or am I just interpolating? And the conclusion I came to, which is somewhat depressing as a scientist, is that I think I actually just interpolate most of the time. I think in practice what I do is I see a problem, and then I bucket that problem into various other categories of problems that I&#8217;ve seen in my career. [&#8230;]It&#8217;s why interdisciplinary research ends up being so useful.&#8221;</p><h4><strong>On data redundancy and necessary variance</strong></h4><p>&#8220;One of the things that&#8217;s often really hard about identifying what data are good or bad is that redundancy is important. We can&#8217;t remove redundancy entirely, right? And in general, when you start going from like exact deduplication to redundancy, it&#8217;s a fuzzy boundary. There are things which are semantically very similar that you might want to fully deduplicate, but then there are other things where, they&#8217;re similar, but you actually do need to see that variance.</p><p>[&#8230;] The challenge is that you don&#8217;t need infinite redundancy, number one, and the amount of redundancy you need is likely not consistent with the distribution of the data. And different concepts will require different redundancy.&#8221;</p><h4><strong>On the challenge of using synthetic data</strong></h4><p>&#8220;The challenge is making sure that the generated data matches the distribution that you actually want to do. This, in general, is the challenge with synthetic data right now. Synthetic data is an incredibly exciting direction &#8212; it&#8217;s one that I think will have a ton of impact, definitely an area that we&#8217;re thinking very hard about in tautology and that we&#8217;ll be doing a lot of work in. And I think that there are clear places where it can make a huge impact, in particular with helping to augment tails and take areas of a distribution that are undersampled relative to where they should be, and helping to fill those in.</p><p>That said, if you kind of use synthetic data naively, it leads to all these problems. There have been a couple of really beautiful papers that have basically shown that you get model collapse if you do this. And the reason for this is fairly intuitive: any time you train a generative model on a dataset, it tends to overfit the modes, and it underfits the tails. So, if you then were to recursively do this <em>n</em> times, each time training on the outputs of the generative model, you would eventually completely lose the tails and you end up with a dumb function.&#8221;</p><h3><strong>References</strong></h3><ul><li><p><a href="https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf">AlexNet / ImageNet Classification with Deep Convolutional Neural Networks</a></p></li><li><p><a href="https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf">Playing Atari with Deep Reinforcement Learning</a></p></li><li><p><a href="https://mitchellnw.github.io/">Mitchell Wortsman</a></p></li><li><p><a href="https://arxiv.org/abs/1906.03728">The Generalization-Stability Tradeoff In Neural Network Pruning</a></p></li><li><p><a href="https://scholar.google.com/citations?user=YdiZoJgAAAAJ&amp;hl=en">Brian Bartoldson</a></p></li><li><p><a href="https://arxiv.org/pdf/2103.10697">ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases</a></p></li><li><p><a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">The Bitter Lesson by Rich Sutton</a></p></li><li><p><a href="https://ai.meta.com/results/?content_types%5B0%5D=publication&amp;research_areas%5B0%5D=core-machine-learning">Fundamental AI Research (FAIR), Core Machine Learning</a></p></li><li><p><a href="https://arxiv.org/abs/1807.04225">Measuring abstract reasoning in neural networks</a></p></li><li><p><a href="https://fh295.github.io/">Felix Hill (DeeMind)</a></p></li><li><p><a href="https://arxiv.org/abs/2206.14486">Beyond neural scaling laws: beating power law scaling via data pruning</a></p></li><li><p><a href="https://bsorsch.github.io/">Ben Sorcher</a></p></li><li><p><a href="https://ganguli-gang.stanford.edu/surya.html">Surya Ganguli</a></p></li><li><p><a href="https://arxiv.org/pdf/2312.11805">Gemini: A Family of Highly Capable Multimodal Models</a></p></li><li><p><a href="https://mayeechen.github.io/">Mayee Chen</a></p></li><li><p><a href="https://openreview.net/pdf?id=IoizwO1NLf">Skill-it! A data-driven skills framework for understanding and training language models</a></p></li><li><p><a href="https://arxiv.org/abs/2305.13731">Text Is All You Need: Learning Language Representations for Sequential Recommendation</a></p></li><li><p><a href="https://psycnet.apa.org/record/1959-09865-001">The perceptron: A probabilistic model for information storage and organization in the brain</a></p></li><li><p><a href="https://syncedreview.com/2019/02/22/yann-lecun-cake-analogy-2-0/">Yann LeCun cake metaphor</a></p></li></ul><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><h3><strong>About Imbue</strong></h3><p><a href="https://imbue.com/">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating powerful computing tools controlled by individuals.</p><p>LinkedIn: <a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p><p>Twitter/X: <a href="https://imbueai.substack.com/p/x.com/imbue_ai">@imbue_ai</a></p><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p>]]></content:encoded></item><item><title><![CDATA[Percy Liang, Stanford: How foundation models work]]></title><description><![CDATA[Percy Liang is an associate professor of computer science and statistics at Stanford.]]></description><link>https://ideas.imbue.com/p/episode-35-percy-liang-stanford-on-68d</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-35-percy-liang-stanford-on-68d</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Thu, 09 May 2024 17:24:09 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090095/f30880bd377c9d7f48065296542807db.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em>Percy Liang is an associate professor of computer science and statistics at Stanford. These days, he&#8217;s interested in understanding how foundation models work, how to make them more efficient, modular, and robust, and how they shift the way people interact with AI&#8212;although he&#8217;s been working on language models for long before foundation models appeared. Percy is also a big proponent of reproducible research, and toward that end he&#8217;s shipped most of his recent papers as executable papers using the CodaLab Worksheets platform his lab developed, and published a wide variety of benchmarks.</em></p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><h4><strong>On the paradigm shift of foundation models</strong></h4><p>I was spending a lot of time thinking about robustness of machine learning because there was a suspicion that deep learning methods were able to do really well on these benchmarks, but when you actually use them in real life, they would just fall apart. And this was true with adversarial examples, both in vision but also in language. It seemed like these really high performing systems that top these leaderboards, superhuman, actually just fell apart when they didn&#8217;t work out of the domain.</p><p>So I did that for a while, and then foundation models happened. GPT-3 came out and it just blew my socks off in terms of the idea that you could train a language model, just next word prediction, and you could get a model that did way more than I could imagine. Zero-shot in-context learning and all these capabilities just emerged. It really suggested to me that there was a paradigm shift, and I think at that point I sort of said, &#8220;you know what, I could go on and break the system in all sorts of different ways, but I think that&#8217;s not where the action is &#8212; I think the action is really trying to understand these systems, harness them for applications, and understand the social impact.&#8221;</p><h4><strong>On the benefits of academia in improving AI capabilities</strong></h4><p>I think that academia has multiple functions. One is, as usual, it&#8217;s constantly creating really novel ways of doing things, proving them out, and someone can scale it up. I think there is a difference between doing things at small scale with intention of doing things at small scale, and doing things at small scale with intention of scaling up. [&#8230;] FlashAttention was one of my favorite examples of something that came out of academia and now is everywhere in industry. So, I think there&#8217;s always still space for producing these more fundamental changes to how model building works. Actually, another one &#8212; direct preference optimization (DPO) &#8212; I think that&#8217;s a really influential piece of work that you don&#8217;t need that much compute to do, so there&#8217;s a lot of things you can do on the method side.</p><p>Then there&#8217;s evaluation. We already talked about that and how being a sort of neutral third party and thinking deeply about the evaluation is something that I think we&#8217;re just as good as, if not better than, people with a larger compute budget to do. And then there&#8217;s the long-term stuff about how do you do data attribution and how do you retool the whole incentive system. I don&#8217;t think industry is just going to touch that because that&#8217;s really thinking at a societal level rather than an individual organization trying to build a model.</p><h4><strong>On using agents to simulate social dynamics</strong></h4><p>There&#8217;s actually two types of agents, so we publish on both. The classical type of agents like MLAgentBench is, you basically have a language model that&#8217;s wrapped around some sort of architecture with tool use and it is able to do more things than just a raw LLM. And this is what people typically think about agents. There&#8217;s the other type of agents, which is exemplified by generative agents, and there, the idea is simulation. There&#8217;s no goal. [&#8230;] The goal is just to simulate and see what happens. Say you have a city of 25 agents, each backed by an LLM, prompted to basically live their daily lives. They interact, and what you see is different types of emergent behaviors, social emergent behaviors, not within a model. And I think that&#8217;s just really fascinating. One thing I think would be really interesting is what happens if you scale this up, which will require compute and fast inference. But if you could scale it up, maybe you actually have some interesting social dynamics.</p><h4><strong>On a fairer vision for training foundation models</strong></h4><p>Longer term, what I&#8217;m really excited about is a vision of how foundation models can be built. The current status quo is you have all these people in the world who write books write essays, take pictures, create, essentially, content which then gets scraped up into datasets that you use to train foundation models, then serve people and products. And this is has many structural problems. One is that the content producers don&#8217;t get actually any credit or pay. So that&#8217;s why you see many lawsuits that are happening. Another problem is that there&#8217;s a massive amount of centralization and determining these models&#8217; behavior, which is, again, lack of transparency, so we don&#8217;t know what&#8217;s happening behind the scenes. And I just wonder how could we do things differently? I don&#8217;t have the technical answer, but just kind of a vision to paint out. So, what if we were able to actually attribute predictions to the actual training source?</p><p>This is actually something I worked on seven years ago, but in a more limited fashion. If you could do data attribution and you could do it reliably, then maybe you could actually set up a more economically viable system where you pay people for their contributions, and that maybe incentivizes better data quality. And there wouldn&#8217;t be the same lawsuits at least because maybe as long as people are getting paid, hopefully we&#8217;ll be happier. That&#8217;s one kind of direction.</p><p>The other direction is thinking about the values that these language models embody, which is something I think is really important to foreground and not just sweep it under the umbrella of &#8216;we&#8217;re aligning to human values and we&#8217;re being safe&#8217; because that is such a complex construct, especially for a single organization to say, like, &#8216;Oh, don&#8217;t worry, we&#8217;ll handle it.&#8217; It&#8217;s just not a viable way forward. So, how do you make this process more democratic? How can you elicit some values or how do you have a governance structure that is more participatory and gets you more better representation so that the values of a language model are actually reflecting what people want, rather than whatever a few set of people behind closed doors decided?</p><h4><strong>On the dangers of polarization</strong></h4><p>I do think that we live in this shared world, and if everyone has their own customized model, which really is a little virtual world that they live in, that&#8217;s basically how you get polarization. And I think that is a problem that we want to fight. If you think about each of these language models in the future, I think a primary way that we&#8217;ll interact with the world and get information and also take action in the world is probably going to be mediated by these models. So, that better be tethered to reality and not just based on some money-making ad scheme that gets people to basically believe whatever they want. And there needs to be some sort of shared reality, if nothing else because the real world demands it.</p><h3><strong>References</strong></h3><ul><li><p><a href="https://rajpurkar.github.io/SQuAD-explorer/">Stanford Question Answering Dataset (SQuAD)</a></p></li><li><p><a href="https://crfm.stanford.edu/">Stanford Center for Research and Foundation Models</a></p></li><li><p><a href="https://crfm.stanford.edu/fmti/">Foundation Models Transparency Index</a></p></li><li><p><a href="https://mlcommons.org/">MLCommons</a></p></li><li><p><a href="https://arxiv.org/abs/2304.03442">Generative Agents: Interactive Simulacra of Human Behavior</a> by Joon Sung Park, Joseph C. O&#8217;Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein</p></li><li><p><a href="https://arxiv.org/abs/2310.03302">MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation</a> by Qian Huang, Jian Vora, Percy Liang, Jure Leskovec</p></li><li><p><a href="https://arxiv.org/abs/1412.6980">Adam: A Method for Stochastic Optimization</a> by Diederik P. Kingma, Jimmy Ba</p></li><li><p><a href="https://ai.stanford.edu/~tengyuma/">Tengyu Ma</a></p></li><li><p><a href="https://arxiv.org/abs/2205.14135">FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness</a> by Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher R&#233;</p></li><li><p><a href="https://cip.org/">The Collective Intelligence Project</a></p></li><li><p><a href="https://www.anthropic.com/news/collective-constitutional-ai-aligning-a-language-model-with-public-input">Collective Constitutional AI: Aligning a Language Model with Public Input</a></p></li></ul><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><h3>About Imbue</h3><p><a href="https://imbue.com/">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating computing tools controlled by individuals.</p><ul><li><p>Website: <a href="https://imbue.com/">https://imbue.com/</a></p></li><li><p>LinkedIn: <a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p></li><li><p>Twitter/X: <a href="https://imbueai.substack.com/p/x.com/imbue_ai">@imbue_ai</a></p></li><li><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p></li><li><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Seth Lazar, Australian National University: The political philosophy of AI]]></title><description><![CDATA[Seth Lazar is a professor of philosophy at the Australian National University, where he leads the Machine Intelligence and Normative Theory (MINT) Lab.]]></description><link>https://ideas.imbue.com/p/episode-34-seth-lazar-australian-185</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-34-seth-lazar-australian-185</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Tue, 12 Mar 2024 23:14:19 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090096/05b2c5361b6528d8f53220ce91fc5d93.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em>Seth Lazar is a professor of philosophy at the Australian National University, where he leads the Machine Intelligence and Normative Theory (MINT) Lab. His unique perspective bridges moral and political philosophy with AI, introducing much-needed rigor to the question of what will make for a good and just AI future.</em></p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><h4><strong>On AI as a force multiplier for political power</strong></h4><p>&#8220;I got much more into thinking about the political philosophy of AI because I realized that AI based on machine learning was the most significant means of extending the capabilities of those who have power that has been invented since, I guess, the invention of law &#8212; so, a really significant force multiplier for those who govern. And as we were talking about before, we see that with the AI companion and summaries in Zooms: the ability to take all of the recording that we&#8217;re doing and translate that into actionable insights that you can then use to shape people&#8217;s behavior, it&#8217;s bananas.</p><p>So I end up focusing on that. But the cool thing is that with the kind of natural language capabilities of LLMs, there&#8217;s a sense in which you can kind of go back to some of those more top down-type approaches to ethics for AI that were kind of closed off when you had to find a way of mathematizing complex moral concepts. Now you can actually leverage natural language understanding and sort of an underlying moral understanding of those concepts.&#8221;</p><h4><strong>On overlooking moral nuance</strong></h4><p>&#8220;Because of that desire for certainty, I think a lot of folks have focused around a particular normative framing that offers that certainty at the expense of nuance. So there in particular, people have been worried about existential risk from future AI systems. And one of the reasons why people are worried about that or why they focus on that is because it removes all of these difficult questions about uncertainty because we all know it would be bad for the whole human race to be wiped out. There&#8217;s no debate &#8212; I mean, a little bit. Some people might debate it, but only on the margins and sort of obscure philosophy papers. But for almost everybody else, wiping out humanity sucks &#8212; and you don&#8217;t need to have these complicated questions about, like, <em>what do we really want to do?</em>&#8221;</p><h4><strong>On premature regulation</strong></h4><p>&#8220;I also think that the appetite for regulating foundation models due to motivations coming out of concern about existential risk, to my mind, has led to some bad decisions in the last year where there&#8217;s been a sort of an apparent alignment between folks who are concerned with the present and folks who are concerned with the further future. But I think that&#8217;s led to kind of just rushing through regulations for systems that we don&#8217;t really understand well enough to regulate successfully. So, for the most part, with the EU AI Act or with the Executive Order, it&#8217;s stuff that is intentionally designed to be fairly malleable &#8212; so regulations that will be susceptible to change over the next year or two. But I do on the whole thing that there&#8217;s been a bit of a mad dash to regulate for the sake of regulating which I think is probably going to have adverse near-term consequences, whether through becoming irrelevant or through limiting the decentralization of power.</p><h4><strong>On legitimate power</strong></h4><p>&#8220;The question of who gets to exercise power is really important. Like, is it appropriate that an unelected, unaccountable executive at a company far away from your country is making these significant decisions about how you&#8217;re able to communicate online, how you&#8217;re able to use your AI tools? Or should that be something that is a decision that is made by people within your country? If it&#8217;s people within your country, it&#8217;s not enough that it just be your compatriots, right? It needs to be the case that they are exercising power with the appropriate authority to do so.&#8221;</p><h4><strong>On the limits of human and generative agents</strong></h4><p>&#8220;That&#8217;s something that you wouldn&#8217;t want to happen with generative agents, that basically they get to kind of do things on your behalf that you wouldn&#8217;t be permitted to do for yourself. That would be a real risk. And if we just talk about alignment, then that&#8217;s what we&#8217;re going to get because they&#8217;ll just be aligned to the user&#8217;s interest, and damn everybody else. But I think also a lot of the constraints that apply to us are fundamentally conditional on the kinds of agents that we are. A lot of morality is about dealing with the fact that we&#8217;re not able to communicate instantaneously with one another in a way that is perfectly transparent. If we could do that, if we could coordinate in that way, where we could communicate, be perfectly transparent, and then stick to it, so much of morality would be so different.&#8221;</p><h3><strong>References</strong></h3><ul><li><p><a href="https://www.jstor.org/stable/3595561">Waging War on Pascal&#8217;s Wager</a> by Alan H&#225;jek</p></li><li><p><a href="https://philpapers.org/rec/BENWWW-3#:~:text=We%20consider%20three%20central%20objections,purchase%20than%20its%20interactional%20counterpart">What&#8217;s Wrong with Automated Influence</a> by <a href="https://clairebenn.wordpress.com/">Claire Benn</a> and Seth Lazar</p></li><li><p><a href="https://crfm.stanford.edu/assets/report.pdf">On the Opportunities and Risks of Foundation Models</a> by <a href="https://crfm.stanford.edu/">Stanford University&#8217;s Center for Research on Foundation Models</a></p></li><li><p><a href="https://openai.com/research/frontier-ai-regulation">Frontier AI regulation: Managing emerging risks to public safety</a> (OpenAI)</p></li><li><p><a href="https://www.theguardian.com/commentisfree/2023/nov/28/united-states-artificial-intelligence-eu-ai-washington">&#8220;The US is racing ahead in its bid to control artificial intelligence &#8211; why is the EU so far behind?&#8221;</a> by Seth Lazar (The Guardian)</p></li><li><p><em><a href="https://bookshop.org/p/books/the-age-of-surveillance-capitalism-the-fight-for-a-human-future-at-the-new-frontier-of-power-shoshana-zuboff/9240225">The Age of Surveillance Capitalism</a></em> by Shoshana Zuboff</p></li><li><p><a href="https://arxiv.org/abs/2212.08073">Constitutional AI: Harmlessness from AI Feedback</a> (Anthropic) by <a href="https://arxiv.org/search/cs?searchtype=author&amp;query=Bai,+Y">Yuntao Bai</a> et al.</p></li><li><p><a href="https://www.jamiesusskind.com/">Jamie Susskind</a></p></li><li><p><a href="https://openai.com/blog/democratic-inputs-to-ai">Democratic inputs to AI</a> (OpenAI)</p></li><li><p><a href="https://scholarship.law.upenn.edu/cgi/viewcontent.cgi?article=9654&amp;context=penn_law_review">Digital Switzerlands</a> by Kristen E. Eichensehr</p></li><li><p><a href="https://arxiv.org/abs/2208.08628">Legitimacy, Authority, and Democratic Duties of Explanation</a> by Seth Lazar</p></li><li><p><a href="https://academic.oup.com/edited-volume/41989/chapter-abstract/355437737?redirectedFrom=fulltext">Power and AI: Nature and Justification</a> by Seth Lazar</p></li><li><p><a href="http://knightcolumbia.tierradev.com/content/communicative-justice-and-the-distribution-of-attention">Communicative Justice and the Distribution of Attention</a> by Seth Lazar</p></li><li><p><a href="https://arxiv.org/abs/2302.04761">Toolformer: Language Models Can Teach Themselves to Use Tools</a> by <a href="https://arxiv.org/search/cs?searchtype=author&amp;query=Schick,+T">Timo Schick</a>, <a href="https://arxiv.org/search/cs?searchtype=author&amp;query=Dwivedi-Yu,+J">Jane Dwivedi-Yu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&amp;query=Dess%C3%AC,+R">Roberto Dess&#236;</a>, <a href="https://arxiv.org/search/cs?searchtype=author&amp;query=Raileanu,+R">Roberta Raileanu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&amp;query=Lomeli,+M">Maria Lomeli</a>, <a href="https://arxiv.org/search/cs?searchtype=author&amp;query=Zettlemoyer,+L">Luke Zettlemoyer</a>, <a href="https://arxiv.org/search/cs?searchtype=author&amp;query=Cancedda,+N">Nicola Cancedda</a>, <a href="https://arxiv.org/search/cs?searchtype=author&amp;query=Scialom,+T">Thomas Scialom</a></p></li><li><p><a href="https://arxiv.org/pdf/2310.13798.pdf#:~:text=A%20general%20principle%20may%20thus,value%20for%20steering%20AI%20safely.">Specific versus General Principles for Constitutional AI</a> (Anthropic) by Sandipan Kundu, Yuntao Bai, Saurav Kadavath, et al.</p></li></ul><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><h3>About Imbue</h3><p><a href="https://imbue.com/">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating computing tools controlled by individuals.</p><ul><li><p>Website: <a href="https://imbue.com/">https://imbue.com/</a></p></li><li><p>LinkedIn: <a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p></li><li><p>Twitter/X: <a href="https://imbueai.substack.com/p/x.com/imbue_ai">@imbue_ai</a></p></li><li><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p></li><li><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Tri Dao, Stanford: FlashAttention and efficient training]]></title><description><![CDATA[Tri Dao is a PhD student at Stanford, co-advised by Stefano Ermon and Chris Re.]]></description><link>https://ideas.imbue.com/p/episode-33-tri-dao-stanford-on-flashattention-4d3</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-33-tri-dao-stanford-on-flashattention-4d3</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Wed, 09 Aug 2023 17:00:00 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090097/4318bea8fa488024f7bc6014fae540a1.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em><a href="https://tridao.me">Tri Dao</a> is a PhD student at Stanford, co-advised by Stefano Ermon and Chris Re. He&#8217;ll be joining Princeton as an assistant professor next year. He is the author of <a href="https://arxiv.org/abs/2205.14135">FlashAttention</a> and Chief Scientist at <a href="https://www.together.ai">Together AI</a>.</em> <em>He works at the intersection of machine learning and systems, currently focused on efficient training and long-range context.</em></p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><h4>On how to create a high-performing language model </h4><p>&#8220;I think there are many paths to a high-performing language model. So right now there&#8217;s a proven strategy and people follow that. I think that doesn&#8217;t have to necessarily be the only path. I think my prior is that as long as your model architecture is reasonable and is hardware efficient, and you have lots of compute, and you have lots of data, the model would just do well.&#8221;</p><h4>On designing algorithms that take advantage of hardware</h4><p>&#8220;We&#8217;ve seen that sparsity now is proven to be more useful as people think about hardware-friendly sparsity. I would say the high-level point is we show that there are ways to make sparsity hardware-friendly and there are ways to maintain quality while using sparsity.&#8221;</p><h4>On efficient inference</h4><p>&#8220;I think there&#8217;s gonna be a shift towards focusing a lot on inference. How can we make inference as efficient as possible from either model design or software framework or even hardware? We&#8217;ve seen some of the hardware designs are more catered to inference now&#8212;think, for example, Google TPU has a version for inference, and has a different version for training where they have different numbers of flops and memory bandwidth and so on.&#8221;</p><h4>On taking a contrarian bet on recurrent connections over attention</h4><p>&#8220;We want to understand, from an academic perspective, when or why we need attention. Can we have other alternatives that scale better in terms of sequence length? Because the longer context length has been a big problem for attention for a long time. Yes, we worked on that. We spent tons of time on that. I looked around and maybe it&#8217;s a contrarian bet that I wanna work on something that maybe scaled better in terms of sequence length that, maybe in two to three years, would have a shot at not replacing transformer but augmenting transformer in some settings.&#8221;</p><h3><strong>References</strong></h3><ul><li><p><a href="https://web.stanford.edu/~boyd/">Steven Boyd</a>, Stanford</p></li><li><p><a href="https://arxiv.org/abs/1803.06084">A Kernel Theory of Modern Data Augmentation</a> by Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher R&#233;</p></li><li><p><a href="https://arxiv.org/abs/1903.05895">Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations</a> by Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher R&#233;</p></li><li><p><a href="https://arxiv.org/abs/2112.00029">Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models</a> by Tri Dao*, Beidi Chen*, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher R&#233;.</p></li><li><p><a href="https://arxiv.org/abs/2012.14966">Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps</a> by Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher R&#233;.</p></li><li><p><a href="https://arxiv.org/abs/2204.00595">Monarch: Expressive Structured Matrices for Efficient and Accurate Training</a> by Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher R&#233;.</p></li><li><p>ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, et al.</p></li><li><p><a href="https://arxiv.org/abs/2302.13971">LLaMA: Open and Efficient Foundation Language Models</a> by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth&#233;e Lacroix, Baptiste Rozi&#232;re, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample</p></li><li><p><a href="https://arxiv.org/abs/1911.02150">Fast Transformer Decoding: One Write-Head is All You Need</a> by Noam Shazeer</p></li><li><p><a href="https://arxiv.org/abs/2205.14135">FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness</a> by Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher R&#233;.</p></li><li><p><a href="https://www.nvidia.com/en-us/data-center/resources/mlperf-benchmarks/">MLPerf</a></p></li><li><p><a href="https://www.linkedin.com/in/young-jun-ko-630899106/">Young-Jun Ko from Inflection</a></p></li><li><p><a href="https://arxiv.org/abs/1805.02867">Online normalizer calculation for softmax</a> by Maxim Milakov (NVIDIA), Natalia Gimelshein (NVIDIA)</p></li><li><p><a href="https://www.danfu.org/">Dan Fu</a></p></li><li><p><a href="https://cs.stanford.edu/~chrismre/">Christopher R&#233;</a></p></li><li><p><a href="https://stanford.edu/~albertgu/">Albert Gu</a></p></li><li><p><a href="https://lucidrains.github.io/">Phil Wang</a></p></li></ul><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><h3>About Imbue</h3><p><a href="https://imbue.com/">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating computing tools controlled by individuals.</p><ul><li><p>Website: <a href="https://imbue.com/">https://imbue.com/</a></p></li><li><p>LinkedIn: <a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p></li><li><p>Twitter/X: <a href="https://imbueai.substack.com/p/x.com/imbue_ai">@imbue_ai</a></p></li><li><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p></li><li><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Jamie Simon, UC Berkeley: Theoretical principles for deep neural networks]]></title><description><![CDATA[Jamie Simon is a fourth-year physics Ph.D.]]></description><link>https://ideas.imbue.com/p/episode-32-jamie-simon-uc-berkeley-6be</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-32-jamie-simon-uc-berkeley-6be</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Thu, 22 Jun 2023 18:52:23 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090098/72116600b09adb593ed1d7a7b2258bad.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em><a href="https://james-simon.github.io">Jamie Simon </a>is a fourth-year physics Ph.D. student at UC Berkeley, advised by Mike DeWeese, and a Research Fellow with us at Imbue. He uses tools from theoretical physics to build a fundamental understanding of deep neural networks so they can be designed from first principles. In this episode, we discuss reverse engineering kernels, the conservation of learnability during training, infinite-width neural networks, and much more.</em></p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><p>&#8220;I do think that the deeper idea of reverse engineering kernels is powerful and probably holds across architectures. The central message isn&#8217;t really like: here&#8217;s the particular theory on fully-connected networks. The central message is: let&#8217;s think about the inductive bias of architectures in kernel space directly and see if we can do our design work in kernel space instead of in parameter space.&#8221;</p><p>&#8220;At first glance, the idea of an infinite-width neural network as a useful object of study sounds insane; and why should this be a reasonable limit to take? Like, why, if we want to understand a neural network which like obviously has to be finite to do anything useful, could we hope to learn anything by just making something infinite? Like that, especially is baffling from the viewpoint of classical statistics, where you, you hope to find a parsimonious model you wanna like wield Occam&#8217;s razor like a sword. So, it seems baffling at first that this should be useful, but it turns out actually a number of like, breakthrough results in the, especially, you know, around the early part of my PhD found that some really, like non-trivial, insightful behavior emerge when you take this infinite width limit.&#8221;</p><p>&#8220;In the case of infinite width: If the neural tangent kernel only has trivial alignment, like just chance alignment with the target function of the data it won&#8217;t generalize on it. But in practice, we see very good alignment between this kernel object and then the target function.&#8221;</p><p>&#8220;A question you could ask is, why do convolutional networks do better than fully connect networks on image data? Well, it turns out their kernels have better alignment with image data.&#8221;</p><p>&#8220;Although, people have shown interestingly that if you take the neural tangent kernel of a network after training then the real neural network after training looks a lot as if it had always had its final neural contingent kernel. So like you don&#8217;t have to worry so much about the evolution over time so much as where it ended up only.&#8221;</p><h3><strong>References</strong></h3><ul><li><p><a href="https://redwood.berkeley.edu/">Redwood Center for Theoretical Neuroscience</a></p></li><li><p><a href="https://redwood.berkeley.edu/people/mike-deweese/">Prof. Mike DeWeese</a></p></li><li><p><a href="https://arxiv.org/pdf/2106.03186">Reverse Engineering the Neural Tangent Kernel</a> by Jamie Simon, Sajant Anand, and Mike DeWeese</p></li><li><p><a href="https://arxiv.org/pdf/2110.03922.pdf">The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks</a> by Jamie Simon, Madeline Dickens, Dhruva Karkada, Mike DeWeese</p></li></ul><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><h3><strong>About Imbue</strong></h3><p><a href="https://imbue.com/">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating computing tools controlled by individuals.</p><ul><li><p>Website: <a href="https://imbue.com/">https://imbue.com/</a></p></li><li><p>LinkedIn: <a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p></li><li><p>Twitter/X: <a href="https://imbueai.substack.com/p/x.com/imbue_ai">@imbue_ai</a></p></li><li><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p></li><li><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Bill Thompson, UC Berkeley: How cultural evolution shapes knowledge acquisition]]></title><description><![CDATA[Bill Thompson is a cognitive scientist and assistant professor at UC Berkeley.]]></description><link>https://ideas.imbue.com/p/episode-31-bill-thompson-uc-berkeley-16a</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-31-bill-thompson-uc-berkeley-16a</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Wed, 29 Mar 2023 18:25:24 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090099/a47ece35a61141a20b72f5441a4cbe56.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em><a href="https://billdthompson.github.io">Bill Thompson</a> is a cognitive scientist and assistant professor at UC Berkeley. He runs an <a href="https://ccs-ucb.github.io">experimental cognition laboratory</a> where he and his students conduct research on human language and cognition using large-scale behavioral experiments, computational modeling, and machine learning. In this episode, we explore the impact of cultural evolution on human knowledge acquisition, how pure biological evolution can lead to slow adaptation and overfitting, and much more.</em></p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><p>&#8220;In order to understand the computational processes that give rise to things like complex learned algorithmic behaviors like driving or playing chess or solving a Rubik&#8217;s cube or even language and speaking to each other, we need to have some way of reasoning about how knowledge accumulates across people.&#8221;</p><p>&#8220;This mechanism that we call selective social learning provides a solution to those two problems. The problem is that complex stuff is difficult to discover and difficult to pass on, so it increases the fraction of people who are exposed to the rarer discoveries.&#8221;</p><p>&#8220;One of the things we&#8217;ve been working on is trying to integrate those two things and develop a way of thinking about cultural evolution as distributed algorithmic processes or distributed computation. Thinking about population-level processes as distributed computational processes gives you a way of viewing groups and multi-generational societies &#8212; in a sense, simple societies &#8212; in the same terms that you can think about learning by individuals.&#8221;</p><p>&#8220;If I want to look at how large language models learn to reason, something I would love to do is start to knock out parts of the training data set and say, &#8216;okay, when you knock this part of training data set out, suddenly the reasoning capabilities go away,&#8217; or 'suddenly this aspect of your knowledge or this capacity to acquire structured algorithmic thinking disappears.&#8217; Even just simple stuff like that is not tractable at the moment.&#8221;</p><h3><strong>References</strong></h3><ul><li><p><a href="http://www.cs.toronto.edu/~hinton/absps/evolution.htm">How Learning Can Guide Evolution</a> by Geoffrey E. Hinton &amp; Steven J. Nowlan</p></li><li><p><a href="https://ccs-ucb.github.io">Computational Cognitive Science Laboratory</a></p></li><li><p><a href="https://elifesciences.org/articles/72484">The pupillary light response as a physiological index of aphantasia, sensory and phenomenological imagery strength</a> by Lachlan Kay, Rebecca Keog, Thomas Andrillon, Joel Pearson</p></li></ul><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><p><strong>About Imbue</strong></p><p><a href="https://imbue.com/">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating computing tools controlled by individuals.</p><ul><li><p>Website: <a href="https://imbue.com/">https://imbue.com/</a></p></li><li><p>LinkedIn: <a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p></li><li><p>Twitter/X: <a href="https://imbueai.substack.com/p/x.com/imbue_ai">@imbue_ai</a></p></li><li><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p></li><li><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Ben Eysenbach, CMU: Designing simpler, more principled RL algorithms]]></title><description><![CDATA[Ben Eysenbach is a Ph.D.]]></description><link>https://ideas.imbue.com/p/episode-30-ben-eysenbach-cmu-on-designing-f15</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-30-ben-eysenbach-cmu-on-designing-f15</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Thu, 23 Mar 2023 00:27:16 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090100/efa3940ba7b20dad418bde3934a72bd0.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em><a href="https://ben-eysenbach.github.io/">Ben Eysenbach</a> is a Ph.D. student at Carnegie Mellon University and a student researcher at Google Brain. He is co-advised by Sergey Levine and Ruslan Salakhutdinov. His research focuses on developing RL algorithms that get state-of-the-art performance while being simpler, scalable, and robust. Recent problems he's tackled include long-horizon reasoning, exploration, and representation learning. In this episode, we discuss designing more principled RL algorithms and much more.</em></p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><p>&#8220;If we see all the states we&#8217;ve seen so far and look at the representations, let&#8217;s imagine that those representations have a length of one, so we can think about them as points on a sphere. Then, after we put each of these points on the sphere, we can turn the sphere around and say, okay, where are most of the points, and where are we missing points? And say, you&#8217;re missing points down near Antarctica. And then we can say, okay, let&#8217;s try to get down to Antarctica. And then we could, because we&#8217;re learning a goal condition policy, we say, okay, try to get here or try to get to a state that has this representation.&#8221;</p><p>&#8220;One thing that I&#8217;m really excited about is thinking about how we can leverage this idea of connecting contrastive learning to reinforcement learning to make use of advances in contrastive learning in other domains like NLP and computer vision. In NLP, we&#8217;ve seen really great uses of contrastive learning for things like CLIP that can connect image ideas with language using contrastive learning. And in our contrastive project, we saw how we can connect the states and the actions to the future states. As you might imagine that maybe there&#8217;s a way of plugging these components together, and indeed, you can feel that mathematically there is. And so one thing I&#8217;m really excited in exploring is saying, well, &#8216;can we use this to specify tasks?&#8217; Not in terms of images of what you would want to happen, but rather language descriptions.&#8221;</p><p>&#8220;One of the reasons why I&#8217;m particularly excited about these problems is that these language models, they&#8217;re trained to maximize the likelihood of the next token. That draws a really strong connection to this way of treating reinforcement learning problems as predicting probabilities and as maximizing probabilities. And so I think that these tools are actually much, much more similar than they might seem on the surface.&#8221;</p><p>&#8220;I don&#8217;t know how controversial it is, but I would like to see more effort on taking even existing methods and applying them to new tasks, to real problems. I think part of this will require a shift in how we evaluate papers&#8212;evaluating them not so much on algorithmic novelty rather than on &#8216;did you actually solve some interesting problem?&#8216;&#8221;</p><h3><strong>References</strong></h3><ul><li><p><a href="https://arxiv.org/pdf/1711.06782">Leave No Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning</a> by Benjamin Eysenbach, Shixiang Gu, Julian Ibarz, Sergey Levine</p></li><li><p><a href="https://arxiv.org/abs/2206.07568">Contrastive Learning As a Reinforcement Learning Algorithm</a> by Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, Sergey Levine</p></li><li><p><a href="https://arxiv.org/pdf/1802.06070.pdf">Diversity Is All You Need: Learning Diverse Skills Without a Reward Function</a> by Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine</p></li><li><p><a href="https://arxiv.org/pdf/2110.02719">The Information Geometry of Unsupervised Reinforcement Learning</a> by Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine</p></li><li><p><a href="https://arxiv.org/pdf/1906.05253">Search on the Replay Buffer: Bridging Planning and Reinforcement Learning</a> by Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine</p></li><li><p><a href="https://arxiv.org/pdf/2112.10751">RvS: What Is Essential For Offline RL via Supervised Learning?</a> by Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine</p></li><li><p><a href="https://arxiv.org/pdf/2206.03378">Imitating Past Successes Can Be Very Suboptimal</a> by Benjamin Eysenbach, Soumith Udatha, Sergey Levine, Ruslan Salakhutdinov</p></li></ul><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><h3><strong>About Imbue</strong></h3><p><a href="https://imbue.com/">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating computing tools controlled by individuals.</p><ul><li><p>Website: <a href="https://imbue.com/">https://imbue.com/</a></p></li><li><p>LinkedIn: <a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p></li><li><p>Twitter/X: <a href="https://imbueai.substack.com/p/x.com/imbue_ai">@imbue_ai</a></p></li><li><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p></li><li><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Jim Fan, NVIDIA: Foundation models for embodied agents, scaling data, and why prompt engineering will become irrelevant]]></title><description><![CDATA[Jim Fan is a research scientist at NVIDIA and got his PhD at Stanford under Fei-Fei Li.]]></description><link>https://ideas.imbue.com/p/episode-29-jim-fan-nvidia-on-foundation-7b2</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-29-jim-fan-nvidia-on-foundation-7b2</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Thu, 09 Mar 2023 00:22:25 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090101/05e92a14132478ce70b3c5004cfcb21e.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em>Jim Fan is a research scientist at NVIDIA and got his PhD at Stanford under Fei-Fei Li. Jim is interested in building generally capable autonomous agents, and he recently published MineDojo, a massively multiscale benchmarking suite built on Minecraft, which was an Outstanding Paper at NeurIPS. In this episode, we discuss the foundation models for embodied agents, scaling data, and why prompt engineering will become irrelevant.</em> &nbsp;</p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><p>&#8220;The second implication of RLHF is that prompt engineering will go away eventually. Like, it is something fleeting, and the prompt engineers&#8230; it&#8217;s just not a real job. Let&#8217;s face it. The reason prompt engineering will not be relevant forever is because RLHF &#8211; why prompt engineering even exists in the first place &#8211; is because these systems are misaligned with what humans want, so we have to kind of coerce the model to give us what we want by typing out very unnatural sentences, and to essentially trick the model into solving the task.&#8221;</p><p>&#8220;I&#8217;m still really amazed by how humans do this task. Because we&#8217;re doing the lowest level of control, right? Like, we do the keyboard and mouse controls. And if we want to be, like, stricter about the concepts, we&#8217;re sending neural signals to our fingers and then controlling the finger torques, the torques in each joint, to operate a keyboard and also using a mouse. It&#8217;s incredible how low level we are going, as humans, to do World of Bits, and we seem to have very little problem with our computational efficiency, but I guess procrastination is our unique problem. So that is our unique problem. But otherwise, we&#8217;re computationally efficient. We&#8217;re very efficient. So I&#8217;m just wondering, like, maybe there&#8217;s a way to actually make the lowest level, the most general action space, computationally attractive and even, like, more efficient than we thought it would be.&#8221;</p><p>&#8220;When I was starting to play Minecraft, I watched YouTube videos. I also went to Wiki to look up what to do in my first and, and Wiki tells you that, &#8216;Okay, these are the tools that you must craft and you need to, like, prepare food, otherwise you will starve, and what kind of foods are good, right?&#8217; It&#8217;s all in, in the Wiki, and I also go to Reddit whenever I have a question. I treat that as a stack overflow, and Reddit people give a lot of good advice. That&#8217;s how I played Minecraft even as a humor. That gets me thinking, right, like why shouldn&#8217;t our AI use all of these internet skill knowledge? And if we want our AI algorithm to play this from scratch, it&#8217;s almost impossible because exploration is intractable. If you just take random actions, kind of how big is a chance that you stumble upon a diamond&#8211;it&#8217;s almost literally zero, right? So that also inspired the algorithm approach that we did.&#8221;</p><p>&#8220;What we want is to develop &#8211; or maybe discover, right &#8211; like, general principles to embody intelligence. That&#8217;s what we wanna do. That&#8217;s what MineDojo and Avalon want to achieve, want to enable, right? Not just kind of solving these particular 1000 tasks in the, kind of, the most brute force way. So, yeah, just a word of caution to researchers: resist the urge to overfit, to cheat, to use things that are super specific to Minecraft that will not transfer elsewhere.&#8221;</p><h3><strong>References</strong></h3><ul><li><p><a href="https://arxiv.org/pdf/2206.08853">MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge by Jim Fan, et al.</a></p></li><li><p><a href="https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf">ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton</a></p></li><li><p><a href="http://proceedings.mlr.press/v48/amodei16.pdf">Deep Speech 2: End-to-End Speech Recognition in English and Mandarin by Dario Amodei, et al.</a></p></li><li><p><a href="https://scholar.google.com/citations?user=kukA0LcAAAAJ&amp;hl=fr">Prof. Yoshua Bengio</a></p></li><li><p><a href="http://proceedings.mlr.press/v87/fan18a/fan18a.pdf">SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark by Jim Fan, et al.</a></p></li><li><p><a href="http://proceedings.mlr.press/v70/shi17a/shi17a.pdf">World of Bits: An Open-Domain Platform for Web-Based Agents by Tianlin (Tim) Shi, et al.</a></p></li><li><p><a href="https://twitter.com/karpathy/status/809889202120884224?lang=en">Mini World of Bits</a></p></li><li><p><a href="https://arxiv.org/pdf/2206.11795.pdf">Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos by Bowen Baker, et al.</a></p></li><li><p><a href="https://www.adept.ai/blog/act-1">ACT-1: Transformer for Actions</a></p></li><li><p><a href="https://arxiv.org/pdf/2112.09332.pdf">WebGPT: Browser-assisted question-answering with human feedback by Reiichiro Nakano, et al.</a></p></li><li><p><a href="https://github.com/MineDojo/MineCLIP">MineCLIP: Foundation Model for MineDojo</a></p></li><li><p><a href="https://arxiv.org/pdf/2103.00020.pdf">CLIP: Connecting Text and Images by Alec Radford, et al.</a></p></li><li><p><a href="https://vimalabs.github.io/assets/vima_paper.pdf">VIMA: General Robot Manipulation with Multimodal Prompts by Yunfan Jiang, et al.</a></p></li><li><p><a href="https://openreview.net/pdf?id=Opmqtk_GvYL">MetaMorph: Learning Universal Controllers with Transformers by Agrim Gupta, et al.</a></p></li><li><p><a href="https://arxiv.org/pdf/1706.03762">Attention Is All You Need by Ashish Vaswani, et al.</a></p></li><li><p><a href="https://twitter.com/BostonDynamics">Boston Dynamics</a></p></li><li><p><a href="https://arxiv.org/pdf/2102.12092.pdf">(DALL-E) Zero-Shot Text-to-Image Generation by Aditya Ramesh, et al.</a></p></li><li><p><a href="https://stability.ai/blog/stable-diffusion-public-release">Stable Diffusion</a></p></li><li><p><a href="https://www.linkedin.com/in/ilya-sutskever">Ilya Sutskever</a></p></li><li><p><a href="https://arxiv.org/pdf/1707.06347">Proximal Policy Optimization Algorithms by John Schulman, et al.</a></p></li></ul><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><h3>About Imbue</h3><p><a href="https://imbue.com/">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating computing tools controlled by individuals.</p><ul><li><p>Website: <a href="https://imbue.com/">https://imbue.com/</a></p></li><li><p>LinkedIn: <a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p></li><li><p>Twitter/X: <a href="https://imbueai.substack.com/p/x.com/imbue_ai">@imbue_ai</a></p></li><li><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p></li><li><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Sergey Levine, UC Berkeley: Bottlenecks to generalization in reinforcement learning]]></title><description><![CDATA[Also: why simulation is doomed to succeed, and how to pick good research problems]]></description><link>https://ideas.imbue.com/p/episode-28-sergey-levine-uc-berkeley-f5e</link><guid isPermaLink="false">https://ideas.imbue.com/p/episode-28-sergey-levine-uc-berkeley-f5e</guid><dc:creator><![CDATA[Imbue]]></dc:creator><pubDate>Wed, 01 Mar 2023 23:47:07 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169090102/7414dfc93b46b93fcd2779879dafc117.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><em>Sergey Levine, an assistant professor of EECS at UC Berkeley, is one of the pioneers of modern deep reinforcement learning. His research focuses on developing general-purpose algorithms for autonomous agents to learn how to solve any task. In this episode, we talked about the evolution of deep reinforcement learning, how previous robotics approaches were replaced, and why offline RL is significant for future generalization.</em></p><p>Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.</p><h3><strong>Highlights</strong></h3><p>&#8220;I do think that, in science, it is a really good idea to sometimes see how extreme a design can still work because you learn a lot from doing that. This is, by the way, something, I get a lot of comments on this. You know, I&#8217;ll be talking to people and they&#8217;ll be like, &#8216;Well, we know how to do, like, robotic grasping, and we know how to do inverse kinematics, and we know how to do this and this, so why don&#8217;t you use those parts?&#8217; And it&#8217;s, yeah, you could, but if you want to understand the utility, the value of some particular new design, it kind of makes sense to really zoom in on that and really isolate it and really just understand its value instead of trying to put in all these crutches to compensate for all the parts where we might have better existing kind of ideas.&#8221;</p><p>&#8220;The thing is, robots, if they are autonomous robots&#8211;they should be collecting data way more cheaply in a way larger scale than data we harvest from humans. For this reason, I actually think that robotics in the long run may actually be at a huge advantage in terms of its ability to collect data. We&#8217;re just not seeing this huge advantage now in robotic manipulation because we&#8217;re stuck at the smaller scale, more due to economics, rather than, I would say, science.&#8221;</p><p>&#8220;We want simplicity because simplicity makes it easy to make things work on a large scale. You know, if your method is simple, there are essentially fewer ways that it could go wrong. I don&#8217;t think the problem with clever prompting is that it&#8217;s too simple or primitive. I think the problem might actually be, that it might be too complex and that developing a good, effective reinforcement learning or planning method might actually be a simpler, more general solution.&#8221;</p><p>&#8220;I think, in reality, for any practical deployment of these kinds of ideas at scale, it would actually be many robots all collecting data, sharing it, and exchanging their brains over a network and all that. That&#8217;s the more scalable way to think about on the learning side. But, I do think that also on the physical side, there&#8217;s a lot of practical challenges, and just, you know, what kind of methods should we even have if we want the robot in your home to practice cleaning your dishes for three days. I mean, if you just run a reinforcement learning algorithm for a robot in your home, probably, the first thing it&#8217;ll do is wave its arm around, break your window, then break all your dishes, then break itself, and then spend the remaining time it has, just sitting there at the broken corner. So there&#8217;s a lot of practicalities in this.&#8221;</p><h3><strong>References</strong></h3><ul><li><p><a href="https://scholar.google.com/citations?user=mG4imMEAAAAJ&amp;hl=en">Andrew Ng</a></p></li><li><p><a href="https://arxiv.org/abs/1206.4617">Continuous Inverse Optimal Control with Locally Optimal Examples by Sergey Levine and Vladlen Koltun</a></p></li><li><p><a href="https://arxiv.org/pdf/1312.5602v1.pdf">Playing Atari with Deep Reinforcement Learning by Volodymyr Mnih, et al.</a></p></li><li><p><a href="https://scholar.google.com/citations?user=vtwH6GkAAAAJ">Pieter Abbeel</a></p></li><li><p><a href="https://xbpeng.github.io/">Xue Bin (Jason) Peng</a></p></li><li><p><a href="https://xbpeng.github.io/projects/ASE/2022_TOG_ASE.pdf">ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters by Xue Bin Peng, et al</a>.</p></li><li><p><a href="http://bair.berkeley.edu/">Berkeley Artificial Intelligence Research Lab (BAIR)</a></p></li><li><p><a href="http://joschu.net/">John Schulman</a></p></li><li><p><a href="https://arxiv.org/pdf/1707.06347">Proximal Policy Optimization Algorithms</a></p></li><li><p><a href="https://openai.com/blog/chatgpt/">ChatGPT</a></p></li><li><p><a href="https://ai.stanford.edu/~cbfinn/">Chelsea Finn</a></p></li><li><p><a href="https://irislab.stanford.edu/">Stanford IRIS Lab</a></p></li><li><p><a href="https://www.jmlr.org/papers/volume17/15-522/15-522.pdf">End-to-End Training of Deep Visuomotor Policies by Chelsea Finn, et al.</a></p></li><li><p><a href="https://arxiv.org/pdf/1603.02199.pdf">Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection by Sergey Levine, et al.</a></p></li><li><p><a href="https://scholar.google.com/citations?user=_ws9LLgAAAAJ&amp;hl=en">Peter Pastor</a></p></li><li><p><a href="https://vitchyr.github.io/">Vitchyr Pong</a></p></li><li><p><a href="https://ashvin.me/">Ashvin Nair</a></p></li><li><p><a href="https://arxiv.org/pdf/1807.04742.pdf">Visual Reinforcement Learning with Imagined Goals by Ashvin Nair, et al.</a></p></li><li><p><a href="https://arxiv.org/pdf/2104.07749">Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills by Yevgen Chebotar, et al.</a></p></li><li><p><a href="https://openai.com/blog/clip/">CLIP: Connecting Text and Images</a></p></li><li><p><a href="https://arxiv.org/pdf/2207.04429">LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action by Dhruv Shah, et al.</a></p></li><li><p><a href="https://scholar.google.com/citations?user=-w5DuHgAAAAJ&amp;hl=en">Brian Ichter</a></p></li><li><p><a href="https://scholar.google.com/citations?user=WuWWdKcAAAAJ&amp;hl=en">B&#322;a&#380;ej Osi&#324;ski</a></p></li><li><p><a href="https://arxiv.org/pdf/2206.11871">Offline RL for Natural Language Generation with Implicit Language Q Learning by Charlie Snell, et al.</a></p></li><li><p><a href="https://arxiv.org/pdf/2006.04779">Conservative Q-Learning for Offline Reinforcement Learning by Aviral Kumar, et al</a>.</p></li><li><p><a href="http://whirl.cs.ox.ac.uk/">Whiteson Research Lab</a></p></li><li><p><a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">The Bitter Lesson by Richard Sutton</a></p></li><li><p><a href="https://arxiv.org/pdf/2212.08073.pdf">Constitutional AI: Harmlessness from AI Feedback by Yuntao Bai, et al.</a></p></li><li><p><a href="https://arxiv.org/abs/2210.03370">GNM: A General Navigation Model to Drive Any Robot by Dhruv Shah, et al.</a></p></li><li><p><a href="https://en.wikipedia.org/wiki/Emanuel_Todorov">Prof. Emanuel Todorov</a></p></li><li><p><a href="https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf">MuJoCo: A physics engine for model-based control by Emanuel Todorov, et al.</a></p></li><li><p><a href="https://research.google/teams/robotics/">Google Brain Robotics Research Lab</a></p></li><li><p><a href="https://nrhinehart.github.io/">Nick Rhinehart</a></p></li><li><p><a href="https://neo-x.github.io/">Glen Berseth</a></p></li><li><p><a href="https://arxiv.org/pdf/2112.03899.pdf">Information is Power: Intrinsic Control via Information Capture by Nicholas Rhinehart, et al.</a></p></li><li><p><a href="https://scholar.google.co.uk/citations?user=q_4u0aoAAAAJ&amp;hl=en">Karl Friston</a></p></li><li><p><a href="https://arxiv.org/pdf/1912.05510">SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments by Glen Berseth, et al.</a></p></li></ul><h3><strong>Transcript</strong></h3><p>[00:00:00] <strong>Sergey Levine:</strong> I do think that in science, it is a really good idea to sometimes see how extreme of a design can still work because you learn a lot from doing that, and, this is by the way something I get a lot of comments on this&#8211;like, you know, I&#8217;ll, I&#8217;ll be talking to people and they&#8217;ll be like, &#8220;well, we know how to do like robotic grasping, and we know how to do inverse kinematics, and we know how, how to do this and this, so why don&#8217;t you like use those parts?&#8221; And it&#8217;s like, yeah, you could, but if you wanna understand the utility, the value of some particular new design, it kind of makes sense to really zoom in on that and really isolate it and really just understand its value instead of trying to put in all these crutches to compensate for all the parts where we might have better existing kind of ideas.</p><p>[00:00:38] <strong>Kanjun Qiu:</strong> We&#8217;re really excited to have you, Sergey. Welcome to the podcast. We start with: how did you develop your initial research interests and how have they evolved over time? I know you took a class from Andrew Ng where you decided to switch mid-grad school to machine learning. What were you doing before and then what happened after?</p><p>[00:01:37] <strong>Sergey Levine:</strong> When I started graduate school, I wanted to work on computer graphics. I was always interested in virtual worlds, CG film, video games, that sort of thing. And it was just really fascinating to me how you could essentially create of a synthetic environment in a computer.</p><p>[00:01:53] <strong>Sergey Levine:</strong> So I really want to figure out how we could advance the technology for doing that. When I thought about which area of computer graphics to concentrate in, I decided on computer animation, specifically character animation, because by that point this would&#8217;ve been 2009, had pretty good technology for simulating physics for how inanimate objects behave.</p><p>[00:02:11] <strong>Sergey Levine:</strong> And the big challenge was getting plausible behavior out of virtual humans, when I started working on this, I pretty quickly discovered that, you know, essentially the bottleneck with virtual humans is simulating their minds. All the way from very basic things like, you know, how do you decide how to move your legs if you wanna climb some stairs to more complex things like, you know, how should you conduct yourself if you&#8217;re playing a soccer game with teammates and opponents?</p><p>[00:02:35] <strong>Kanjun Qiu:</strong> Mm-hmm.</p><p>[00:02:36] <strong>Sergey Levine:</strong> This naturally leads us to think about decision making in AI, you know, in my case, initially in service to creating plausible virtual humans. But I realized how big that problem really was, it became natural to think of it more as just developing artificial intelligence systems. And certainly the initial wave of the deep learning revolution, which started around that same time was, you know, a really big part of what got me to switch over from pure computer graphics research into things that involved a combination of control and machine learning.</p><p>[00:03:08] <strong>Kanjun Qiu:</strong> Hmm. Was that in 2011? 2012?</p><p>[00:03:12] <strong>Sergey Levine:</strong> Yeah. So my first paper. Well, my first machine learning paper was actually earlier than that, but my first paper that involved something that we might call deep learning was in 2012. And was actually, right around the same time as the DeepMind Atari work came out. It was actually a little before and it focused on using what today we would call deep reinforcement learning, which back then was not really a term that was widely used for locomotion behaviors for 3D human.</p><p>[00:03:39] <strong>Kanjun Qiu:</strong> Interesting. And then what happened after that?</p><p>[00:03:42] <strong>Sergey Levine:</strong> I worked on this problem for a little while, for the last couple of years in grad school. And then after I finished my PhD I started looking for postdoc jobs because, you know, I was really only, only about partway through switching from graphics to machine learning. So well established really, in either community at that point. Perhaps a lesson for the PhD students that are listening to this that if you wanna switch gear in the fourth year of your PhD is a little chancey. Because I sort of ended up with like one foot on either side of that threshold and, nobody really know me very well. So I decided to do a postdoc in some area where I could get a little bit more established in machine learning.</p><p>[00:04:16] <strong>Sergey Levine:</strong> So, and, and I wanted to stay in the Bay Area for personal reasons. So I got in touch with professor Pieter Abbeel, who&#8217;s now my colleague here at UC Berkeley, about a postdoc position. It was kind of interesting because I interviewed for this job and I thought in the interview went horribly because, well, really, it wasn&#8217;t my fault because I, when I showed up for the interview at UC Berkeley, with Pieter&#8217;s lab, they had moved the deadline for IROS, which was a major robotics conference&#8211;it was supposed to be earlier. So after the deadline, and then all, you know, all the students would listen to my talk and Peter presumably would be a little relaxed. They moved the deadline to be that evening.</p><p>[00:04:48] <strong>Sergey Levine:</strong> So everyone listening to my talk was kind of stressed out. I could tell that like they, they, their minds were elsewhere. Afterwards there was a certain remark, I&#8217;m sure Pieter won&#8217;t mind me sharing this, but he mentioned something to the effect of like, &#8220;Oh, you know, I don&#8217;t think that I want my lab working on all this, like, animation stuff.&#8221; So I, I kind of felt like I really blew it. But, he gave me a call a few weeks later and offered me the job, which was fantastic.</p><p>[00:05:13] <strong>Sergey Levine:</strong> And I guess it was of generous on his part because at the time he was presumed taking a little bit of a chance, but it worked out really well. And I switched over to robotics and that was actually a very positive change in that. A lot of the things that, that, I was trying to figure out in computer animation, they would be tested more rigorously and more thoroughly in the robotics domain because there you really deal with all complexity of the real world.</p><p>[00:05:35] <strong>Kanjun Qiu:</strong> Mm-hmm. That makes sense. Now, how do you think about all the kind of progress with generative environments and animation? Like, do you feel like the original problems you are working on in animation are largely solved, or do you feel like there&#8217;s a lot more to do there?</p><p>[00:05:49] <strong>Sergey Levine:</strong> Yeah, that&#8217;s a good question. So I took a break from the computer graphics world for a while, but then over the last five years, there was actually a student in my lab, Jason Peng, who&#8217;s now a professor at Simon Frazier University in Canada. He just graduated last year and he, more or less, in his PhD, I would say, basically solved the problems that I had tried to do in my own PhD a decade prior. I think he did a much better job with it than I ever did. So he had several works that essentially took deep RL techniques, and combine them with large scale generative adversarial networks to, more or less, provide a pretty comprehensive solution to the computer animation problem. So his latest work, which was actually done in collaboration with NVIDIA, the kind of approach that he adopted is he takes a large data set of motion capture data. You can kind of think of it as like all the motion capture data we can get our hands on and trains a latent variable GAN on it, that will generate human-like motion and embedded into a latent space that will provide kind of a higher level space for control. So, you can sort of think of his method as producing this model where you, you feed it in random numbers and for every random number, it&#8217;ll produce some natural motion, running, jumping, whatever, and then those random numbers serve as a higher level action space. So that latent space, now, everything in that latent space is plausible motion. And then you can train some higher level policy that will steer it in that latent space. And that actually turns out to be a really effective way to do animation because once you get that latent space, now you can forget about whether the motion is plausible&#8211;it&#8217;ll always be plausible and realistic, and now you can just be entirely goal driven in that space.</p><p>[00:07:24] <strong>Kanjun Qiu:</strong> That&#8217;s very clever.</p><p>[00:07:25] <strong>Sergey Levine:</strong> He has a demo in SIGGRAPH this past year where he has virtual characters, you know, doing sword fighting and jumping around and so on. And, this is what in my PhD, if, if someone showed it to me, I would&#8217;ve said this, like science fiction. It was kinda like the dream for the computer graphics community for a long time. I think Jason really did a fantastic job of it. So if anyone listening to this is interested in computer animation, Jason Peng, that&#8217;s his work. His work is worth checking out. He is part-time in NVIDIA, too, so he is doing some mysterious things that he hasn&#8217;t&#8230; He&#8217;s very cagey with the details on it, but I think there might be something big coming out in the imminent future with.</p><p>[00:08:02] <strong>Kanjun Qiu:</strong> That&#8217;s really interesting. so you feel like your PhD work is at least solved by Jason?</p><p>[00:08:08] <strong>Sergey Levine:</strong> Yeah, I think he kind of, yeah, he kind of took care of that one.</p><p>[00:08:12] <strong>Kanjun Qiu:</strong> When you first got started in robotics, what did you feel like were the important problems?</p><p>[00:08:16] <strong>Sergey Levine:</strong> Robotics traditionally is thought of as very much like a geometry problem plus a physics problem. So if you open up like the Springer handbook on, on robotics or a more kind of classical advanced robotics course textbook, a lot of what you will learn about has to do with understanding the geometries of objects and modeling the mechanics of articulated rigid body systems. This approach took us very far from the earliest days of robotics in the fifties and sixties all the way to the kind of robots that are used in manufacturing all the time today. In some ways, the history of, of robotic technology is one of building abstractions, taking those abstractions as far as we can take them, and then hitting some really, really difficult wall. The really difficult wall that robotics generally hits with this kind of approach has to do with situations that are not as fully and cleanly structured as body abstraction would have us believe. Not just cause they have physical phenomena that are outside of this model, but also because they have challenges having to do with characterization identifiability. So if you&#8217;re, you know, let&#8217;s say you have a robot in your home that&#8217;s supposed to just like clean up your home and put away all the objects, even if those are rigid objects that are, in principle, fit within that abstraction, you don&#8217;t know exactly what shape they are, what their mass distribution is and all this stuff. You don&#8217;t have perception and things like that.</p><p>[00:09:33] <strong>Sergey Levine:</strong> You don&#8217;t, you don&#8217;t have perception and things like that. So all of those things, together, more or less put us in, in this place where the, clean abstraction kind of like really doesn&#8217;t give us anything. The analogy here is in the earliest days of computer vision, the first thing that people basically thought of when they thought about how to do computer vision is that, well, computer vision is like the inverse graphics problem. So if you believe the world is made out of shapes, you know, they have geometry, let&#8217;s figure out their vertices and their edges and so on, and people kind of tried to do this for a while and it was very reasonable and very sensible from engineering perspective until, in 2012, Alex Krizhevsky had a solution to the ImageNet challenge that didn&#8217;t use any of that stuff, whatsoever, and just use a giant neural net. So I kind of suspect that like, the robotics world is kind of just getting to that point, like, right around in the last half a decade or so.</p><p>[00:10:24] <strong>Kanjun Qiu:</strong> Hmm. Interesting. And so when you first joined Pieter Abbeel&#8217;s lab as a postdoc, you kind of saw the world of robotics where everything was these like rigid body abstractions. And what were you thinking? Like, were you like, okay, well, seems like, you know, nobody&#8217;s really using deep learning. No one&#8217;s really doing end-to-end learning. I&#8217;m gonna do that. Or kind of how did you end up?</p><p>[00:10:47] <strong>Sergey Levine:</strong> Yeah, so actually, I started working with a student who, his most recent accomplishment was to actually take ideas that were basically rooted in this kind of geometric approach to robotics and extend them somewhat so they could accommodate deformable objects, ropes and cloth, that sort of thing. So they had been doing laundry folding and not tying. And, I won&#8217;t go too much into the technical details, but, but it was kind of in the same wheelhouse as these geometry based methods that had been pioneered for rigid objects and, and grasping in the decades prior. And with some clever exceptions, they could do some not tying and things like that.</p><p>[00:11:19] <strong>Sergey Levine:</strong> And I, I started working on how we could, kind of more or less, throw all that out and replace it with end-to-end learning from deep nets. And I intentionally wanted to make it like a little bit extreme. So instead of trying to like gently turn these geometry-based methods to the ones that use learning more and more we actually decided that we would actually just completely do, you know, the maximally end-to-end thing. The student in question was John Schulman, and he ended up doing his PhD on end-to-end deep reinforcement learning, and later on developed the most widely used reinforcement learning method today, which is PPO. So he now works at OpenAI and perhaps his most recent accomplishment and something that some of your viewers might have heard about, it&#8217;s called ChatGPT. But that&#8217;s maybe a story for another time. So we did some algorithms work there, and then in terms of robotics applications, I worked with another student that some of your listeners might also know, Chelsea Finn. She&#8217;s now professor at Stanford. There we wanted to see if we could introduce kind of the latest and greatest convolution neural network techniques to directly control robot motion.</p><p>[00:12:19] <strong>Sergey Levine:</strong> And again, we, chose to have a very end-to-end design there where we, took the PR2 Robot, and we basically looked through the PR2 manual, and we found the lowest level control you could possibly have. You can&#8217;t command motor torques exactly, but you can command what&#8217;s called motor effort, which apparently is roughly proportional to current on the electric motors. So I did a little bit of coding to set up a controller that would directly command these efforts at some reasonable frequency. Chelsea coded up the ConvNet component. We wired it all together, managed to get a training end-to-end, and then we set up a set of experiments that were really intentionally meant to evaluate whether the end-to-end part really mattered. So this was not, you know, these, these days, this would be something that people would more or less take for granted. It&#8217;s like, yeah, of course end-to-end is better than plugging in a bunch of geometric stuff. But we really wanted to convince people that this was true. So we ran experiments where we separated out localization from control. We had like more traditional computer vision techniques, geometry based techniques, and we try to basically see whether going directly from raw pixels all the way to these motor effort commands could do better. And we set up some experiments that would actually validate that.</p><p>[00:13:19] <strong>Sergey Levine:</strong> So we had these experiments where the robot would take a little colored shape and insert it into a shape sorting cube. So it&#8217;s a children&#8217;s toy we&#8217;re supposed to match the shape to the shape of the hole. And one of the things that we were able to actually demonstrate is that the end-to-end approach was in fact better because essentially it could trade off errors more smartly. So if you&#8217;re inserting this shape into a hole, you don&#8217;t really need to be very, very accurate and figure out where the hole is vertically because you&#8217;ll just be pushing it down all the time. So that&#8217;s more robust to inaccuracy. But then in terms of errors in the other direction there, it&#8217;s a little more sensitive. So we could show that we could actually do better with end-to-end training than if we had localized the whole separately and then commanded a separate controller. That work actually resulted in a paper that was basically the first deep reinforcement learning paper for image-based, real world robotic manipulation. It was also rejected numerous times by robotics reviewers because at the time this was a little bit of a taboo to do, too many neural nets. Eventually, it ended up working out.</p><p>[00:14:17] <strong>Kanjun Qiu:</strong> One thing I&#8217;m really curious about with this end experiment is, did it just work? Like, you set everything up, you code up the CNN, was it really tricky to get working or did it work much better than you expected?</p><p>[00:14:29] <strong>Sergey Levine:</strong> It&#8217;s always very difficult to disentangle these things in science, because like, obviously it didn&#8217;t just work on the first try, but a big part of why it didn&#8217;t work on the first try had to do with a bunch of coding things. For example, one of the things that this, this was sort of before, there were really, really nice clean tools for GPU-based acceleration of ConvNets. So back then, Caffe was one of the things that everybody would use, and it was very difficult for us to get this running on board on the robots. So we actually had some fairly complicated jerry-rigged system where the ConvNet would actually run on one machine. Then in the middle of the network it would send the activations over to a different machine onboard the robot for the real time controller, so it&#8217;s still an end-to-end neural net, but like half the neural net was running on one computer, half it was running on another computer, and then the gradients would have to get passed back. So it was like, it was a little complicated, and the bulk of the challenges we had had more to do with systems design and that sort of thing. But part of why it did basically work once we debug things, was that the algorithm itself was based on things that I had developed for previous projects just without the computer vision components. So going from low dimensional input to actions was something that had already been developed and basically worked.</p><p>[00:15:36] <strong>Sergey Levine:</strong> This was a continuation of my PhD work, so a lot of the challenges that we had had to do with getting the systems parts right. They also had to do with getting the design of the component to be effective in relatively lower data regimes because these robot experiments, they would collect maybe four or five hours of data. So one of the things that Chelsea had to figure out is how to get a neural net architecture that could be relatively efficient. She basically used a proxy task that we designed where instead of actually iterating on the full control task on the real robot, we would have a little post detection task that we would use to just prototype the network and that she could iterate on just entirely offline. So she would test out the ConvNet on that, get it working properly, and then once we knew that it worked for this task, then we kind of knew that it was roughly good enough in terms of sample efficiency and then we just retrained with the end-to-end thing.</p><p>[00:16:22] <strong>Kanjun Qiu:</strong> That makes sense.</p><p>[00:16:23] <strong>Sergey Levine:</strong> So the moral of the story to, to folks who might be listening and working on these kinds of robotic learning systems, it does actually help to break it up into components. Even if you&#8217;re doing end-to-end thing in the end because you can kind of get the individual neural net components all working nicely and then just redo it with the end-to-end thing. And that does tend to take out a lot of the pain.</p><p>[00:16:44] <strong>Kanjun Qiu:</strong> Right. It sounds like you kind of got the components working first. It&#8217;s interesting you made this comment about just making the problem a lot more extreme when you were talking about the student using thin plate spines, and I&#8217;m curious, is this an approach you&#8217;ve used elsewhere? Kind of making the problem much more extreme and throwing out everything.</p><p>[00:17:01] <strong>Sergey Levine:</strong> I think it&#8217;s a good approach. I mean, it depends a little bit on what you wanna do, because if you really want to build a system that works really well, then of course you want to sort of put everything in the kitchen, sink in there, and just like use the best tools for every piece of it. But I do think that in science it is a really good idea to sometimes see how extreme of a design can still work, because you learn a lot from doing that. And this is by the way, something, I get a lot of comments on this. Like, you know, I&#8217;ll, I&#8217;ll be talking to people and they&#8217;ll be like, &#8220;well, we know how to do like robotic grasping, and we know how to do inverse kinematics, and we know how, how to do this and this, so why don&#8217;t you like use those parts?&#8221; And it&#8217;s like, yeah, you could, but if you wanna understand the utility, the value of some particular new design, it kind of makes sense to really zoom in on that and really isolate it and really just understand its value instead of trying to put in all these crutches to compensate for all the parts where we might have better existing kind of ideas.</p><p>[00:17:52] <strong>Sergey Levine:</strong> You know, like as an analogy, if you wanna design better engines for electric cars, like maybe you do build just like, not like a fancy hybrid car, but really just like an an electric race car or something. Just see like how fast can it go? And then whatever technology you develop there, like yeah, you can then put it in, you know, combine it with all these pragmatic and very sober decisions and make it work, afterwards.</p><p>[00:18:11] <strong>Kanjun Qiu:</strong> That&#8217;s really interesting. So kind of do the hardest thing first. Do the most extreme thing.</p><p>[00:18:15] <strong>Sergey Levine:</strong> Yeah.</p><p>[00:18:15] <strong>Kanjun Qiu:</strong> so after you published this extremely controversial paper that gets rejected everywhere, what happened then? What were you interested in next?</p><p>[00:18:22] <strong>Sergey Levine:</strong> There were a few things that we wanted to do there, but perhaps the most important one that we came to realize is, and this is going to lead to things that in some ways I&#8217;m still working on, is that of course we don&#8217;t really want end-to-end robotic deep learning systems that just train with like four or five hours of data. The full power of deep learning is really only realized once you have very large amounts of data that can enable broad generalization. So this was a nice technology demo and that it showed that deep nets could work with robots for manipulation, and of course, you know, many people took that up and there&#8217;s a lot more work on using deepness for about manipulation now. But it didn&#8217;t realize the full promise of deep learning because the full promise of deep learning required large data sets. And that was really the next big frontier. So what I ended up working on after this was some work that was done at Google. So I started at Google in 2015, and there we wanted to basically scale up deep robotic learning. And what we did is, we again, we took a fairly extreme approach. We&#8217;d intentionally chose not to do all sorts of fancy transfer learning and so on. We went for like the pure brute force thing and we put 18 robots in a room, and we turned them on for months and months and months and had them collect enormous amounts of data autonomously.</p><p>[00:19:42] <strong>Sergey Levine:</strong> And that, led to the, sometimes referred to as the arm farm project. Um, it might have actually been Jeff Dean who coined that term. At one point we wanted to call it the armpit, but I think really like for this project, we wanted to pick a robotic task that was kind of basic in the sense that it was something that like everybody would want, and it was fairly broad that like all robots should have that capability and it was something that could be applied to large sets of objects or something that really needed generalization. So we went with robotic grasping, like basically bin picking. Because that&#8217;s not, maybe that&#8217;s not the most glamorous thing, but it is something that really needs to generalize because you can pick all sorts of different objects. It&#8217;s something that every robot needs to have, and it&#8217;s something that we could scale up. So we went for that because that seemed like the right target for this kind of very extreme purist brute force. Basically what we did is we, went down to Costco and Walmart and we bought tons of plastic chunk and we would put it in front of these robots, and just like day after day, we would load up the bins in front of them and they would just run basically as much as possible. One of the things that I spent a lot of time on is just like getting the uptime on the robots to be as high as it could be. So, Peter Pastor, who&#8217;s a or roboticist at Google AI, we basically did a lot of work to increase that uptime, and of course, it was with a great team that was also supporting the effort. Peter Pastor was probably the main one who did a lot of that stuff, and after several months it got to a point where actually relatively simple techniques could acquire very effective robotic grasping policies. An interesting anecdote here is we were doing this work&#8211;it took us a while to do it&#8211;so it came out in 2016, and in just a few months, after AlphaGo was announced, Alex Krizhevsky who was working with us on the ConvNet design, when AlphaGo was announced, he actually told me something to the effect of like, &#8220;Oh, you know, for AlphaGo they have like a billion something games, and you gave me only a hundred thousand grasping episodes.</p><p>[00:21:33] <strong>Sergey Levine:</strong> This, doesn&#8217;t seem like this is gonna work.&#8221; So I remember I had some snarky retort where I said, &#8220;Well, yeah, they have like a billion games, but they still can&#8217;t pick up the go pieces.&#8221; But on a more serious note, like around this time, I was actually starting to get kind of disappointed because this thing didn&#8217;t really work very well. And I think some of this robotics wisdom had rubbed off on me. So I was saying, well, like, okay, maybe we should like put in some more like, you know, domain knowledge about the shapes of objects and so on. I remember Alex also told me like, &#8220;Oh, no, no, just like, be patient. Just like add more data to it.&#8221; So I headed that advice and after a few more months, it took a little while, but after a a few more months, basically the same things that he had been trying back then, just started working once there was enough of a critical mass. Obviously there were a few careful design decisions in there, but we did more or less succeed in this, fairly extreme kind of purest way of tackling the problem, which again, it was not by any means the absolute best way to build a grasping system. And actually since then, people have developed more hybrid grasping systems that use depth than 3D and simulation and also use deep learning, and I think it&#8217;s fair to say that they do work better. But it was a pretty interesting experience for us that just getting robots in a room for several months with some simple but careful design choices could result in a very effective grasping system.</p><p>[00:22:46] <strong>Kanjun Qiu:</strong> Mm-hmm. Mm-hmm. That&#8217;s really interesting.</p><p>[00:22:46] <strong>Josh Albrecht:</strong> One of the things that&#8217;s interesting for me is that the scale of that data, to his point about, you know, like a billion go games or like GPT-3, like the amount of data that it&#8217;s trained on, the scale of these robotics things is just so much smaller, like a few months, like what was the total number of months in arms? Like the total amount of time in that data set is only on the order of years. Right?</p><p>[00:23:07] <strong>Sergey Levine:</strong> Yeah, so it&#8217;s a little hard to judge because obviously the uptime for the robots is not a hundred percent, but roughly speaking, yeah, it&#8217;s, if I do a little bit of quick mental math, it would be on the order of a couple of years of robot time, and the total size of the data set was on the order of several hundred thousand trials which amounts to about 10 million images. But of course, you know, the images are correlated in time. So basically, it&#8217;s roughly like ImageNet sized, but not much bigger than that.</p><p>[00:23:35] <strong>Kanjun Qiu:</strong> Mm-hmm. Right. Right. And the images are much less diverse than ImageNet.</p><p>[00:23:40] <strong>Sergey Levine:</strong> Of course, yes.</p><p>[00:23:40] <strong>Kanjun Qiu &amp;</strong> <strong>Josh Albrecht:</strong> Yeah. That&#8217;s interesting. It&#8217;s surprising that it worked at all. Given how small data, huge data set. Mm-hmm.</p><p>[00:23:48] <strong>Sergey Levine:</strong> Well, although, one thing I will say on this topic is that I think a lot of people are very concerned that large data sets in robotics might be impractical. And there&#8217;s a lot of work, a lot of very good work, I should say, on all sorts of transfer learning ideas. But I do think that it&#8217;s perhaps instructive to think about the problem as a prototype for a larger system because if someone actually builds, let&#8217;s say a home robot, and let&#8217;s say that one in a hundred people in America buy this robot and put it in their homes, that&#8217;s on the order of 3 million people, 3 million robots, and if those 3 million robots do things for even one month in those homes, that is a lot of data. So the thing is, robots, if they&#8217;re autonomous robots, they should be collecting data way more cheaply, in a way larger scale than data that we harvest from humans. So for this reason, I actually think that robotics in the long run may actually be at a huge advantage in terms of its ability to collect data.</p><p>[00:24:48] <strong>Sergey Levine:</strong> We&#8217;re just not seeing this huge advantage now in robotic manipulation because we&#8217;re stuck at the smaller scale, more due to economics rather than, I would say science. And by the way, here&#8217;s an example that maybe hammers this point home. If you work at Tesla, you probably don&#8217;t worry about the size of your data set. You might worry about the number of labels, you&#8217;re not gonna worry about the number of images you&#8217;ve got because that robot is actually used by many people. So if robotic arms get to the same point, we won&#8217;t worry about how many images we&#8217;re collecting.</p><p>[00:25:17] <strong>Kanjun Qiu:</strong> Mm-hmm. .I&#8217;m curious what your ideal robot to deploy would be like. What do you think about the humanoid robot versus some other robot type?</p><p>[00:25:24] <strong>Sergey Levine:</strong> Yeah, that&#8217;s a great question. If I was more practically minded, if I was a little more entrepreneurial, I would probably give maybe a more compelling answer. But to be honest I actually think that the most interesting kinds of robots to deploy, especially with reinforcement technology, might actually be robots that are very unlike humans.Of course it&#8217;s very tempting from science fiction stories and so on, to think, okay, well, robots, they&#8217;ll be like Rosie from the Jetsons or, you know Commander Data from Star Trek or something. They&#8217;ll look like people and they will kind of do things like people and maybe they will, that&#8217;s fine.</p><p>[00:25:56] <strong>Sergey Levine:</strong> There&#8217;s nothing wrong with that, and that&#8217;s kind of exciting. But perhaps even more exciting is the possibility that we could have morphologies that are so unlike us that we wouldn&#8217;t even know how these things could do stuff. You know, maybe your home robot will be a swarm of a hundred Quadrobots that just like fly around like little flies and like clean up your house, right? So they will actually behave in ways that we would not have been able to design manually and where good reinforcement learning methods would actually figure out ways to control these bizarre morphologies in ways that are actually really effective.</p><p>[00:26:27] <strong>Kanjun Qiu:</strong> Huh? That&#8217;s really interesting.</p><p>[00:26:27] <strong>Josh Albrecht:</strong> It&#8217;d be interesting to see happen. I think maybe one other, I mean there&#8217;s lots of things against the humanoid, structure, but one thing that, it does have going for is most of the world is currently made for people. So like to open this door, right? This sliding door is like kind of heavy. It&#8217;s almost impossible. The Quadrobot doesn&#8217;t matter how clever it is cause it just doesn&#8217;t have enough force. But yeah, I would be interesting to think about like what kind of crazy strategies they might come up with.</p><p>[00:26:52] <strong>Kanjun Qiu:</strong> You worked on this Google Arm farm project for a while and eventually, it seems like enough data allows you to use relatively simple algorithms to be able to solve the grasping problem in this kind of extreme setup. What were you thinking about after that?</p><p>[00:27:06] <strong>Sergey Levine:</strong> After that, the next frontier that we need to address is to have systems that can handle a wide range of tasks. So grasping is great, but it&#8217;s a little special. It&#8217;s special in the sense that one very compact task definition, which is like, are you holding an object in your gripper can encompass a great deal of complexity. Most tasks aren&#8217;t like that. For most tasks, you need to really specify what it is that you want the robot to do, and it needs to be deliberate about pursuing that specific goal and not some other goal. So that leads us into things like multi-task learning. It leads us into things like goal specification and instructions.</p><p>[00:27:42] <strong>Sergey Levine:</strong> One of the things that my students and I worked on when I started as a professor at UC Berkeley, is trying to figure out how we can get goal condition reinforcement learning to work really well. So we sat down and we thought, well, like this grasping thing, that was great because like one very concise task definition leads to a lot of complexity. So you can define like a very simple thing, like are you holding an object? Lots of complexity emerges from that just through kind of autonomous interaction. So can we have something like that, some very compact definition that encompasses a wide range of different behaviors? The thing that we settled on to start with was goal conditioned reinforcement learning, where essentially the robot gets, in the early days, literally a picture of what the environment should be, and it tries to manipulate the environment until it matches that picture. Of course , you can do goal conditioned reinforcement learning in other ways. For example, more recently, the way that we and many others have been approaching it as by defining the go through language. But just defining through pictures is fine to get started because there you, you kind of just focus on just the visual and the control aspect of the problem.</p><p>[00:28:44] <strong>Sergey Levine:</strong> The very first work that we had on this image goal condition reinforcement learning, this was worked done by two students, Vitchyr Pong and Ashvin Nair, who both incidentally work at OpenAI now, but back then they were working on this image-based robotic control. The robot could do very simple things. It was like push an upside down blue bowl, like five inches across the table, right? That was the task. But that was the, first ever demonstration of an image-based goal conditioned RL system. So other people had done non-image based goal condition things, but, in the real world with images, that was the first demonstration, and yes, pushing an upside down Blue Bowl five inches across the table is kind of lame. But, it was a milestone. They got things rolling. From there they did other things that were a little more sophisticated. One of the experiments that really stands out in my mind that I thought was pretty neat is, we had set up a robot in front of a little cabinet with a door, and Vitchyr and Ash had developed an exploration algorithm where the robot would actually directly imagine possible images that it used the generative model.</p><p>[00:29:39] <strong>Sergey Levine:</strong> It was the VAE-based model that would literally hypothesize the kinds of images that they could accomplish in this environment, attempt to reach them and then update its model. So it was like the robot is sort of like dreaming up what it could do, attempting and see if it actually works and if it doesn&#8217;t work, imagine something else. They ran this experiment, and obviously it was a smaller scale experiment than the Arm farm. They ran it over just one day, but within 24 hours it would actually first figure out how to move the gripper around because it was really interesting that the gripper moved. But well then once it started touching the door, it saw that, oh, actually, like the door starts swinging open. So now it imagines lots of different angles for the open door. And, from there it starts actually manipulating it. And then it learns how to open the door to any desired angle at the end, and that was entirely autonomous, right? You just put it in front of the door and, and wait. So that was a pretty neat kind of sign of things to come, obviously, at a much smaller scale, suggests that if you have this kind of goal image thing, then you could push it further and further.</p><p>[00:30:28] <strong>Sergey Levine:</strong> And of course, since then we, and, many others have pushed this further. In terms of more recent work on this topic, there&#8217;s a very nice paper from Google called Actionable Models, where he actually combines this with offline reinforcement learning using a bunch of these large multi-robot data sets that have been collected at Google, to learn very general goal conditioned policies that could do things like rearrange things on a table and stuff like that. So this stuff has come along a long way since then.</p><p>[00:30:51] <strong>Josh Albrecht:</strong> For the goal condition on language, like from an image perspective, it&#8217;s easy to tell like, is this image the image that I wanted? But from language, like what sort of techniques are you excited about for evaluating whether this goal has actually been accomplished?</p><p>[00:31:05] <strong>Sergey Levine:</strong> There&#8217;s a lot of interesting work going on in this area right now and some of it my colleagues and I at Google are working on, there are many other groups that are working on this, like Dieter Fox&#8217;s lab is doing wonderful work in this area within Nvidia. And, well, so this is something that, people have had on their mind for a while, but I think that most recently, the thing that has really stimulated a lot of research in this area is the advent of vision language models like CLIP that actually work.</p><p>[00:31:30] <strong>Sergey Levine:</strong> And in some ways I feel a certain degree of vindication myself in focusing on just the image part of the problem for so long. Because I think one of the things that good vision language models allow you to do, is not worry about the language so much because if you have good visual goal models, then you can plug them in with vision language models and the vision language model almost acts like a front end for interfacing these non-linguistic robotic controllers with language. As a very kind of simple example of this, my student, Dhruv Shah has a paper called LM-Nav that basically does this for navigation. So Dhruv had been working on just purely image-based navigation, kind of in a similar regime where you specify an image goal and then together with Brian Ichter from Google and B&#322;a&#380;ej from University of Warsaw. They have a recent paper where they basically just kind of do the obvious thing. They take a vision language model, they take clip, and they just weld it onto this thing as a language front end. So everything underneath is just purely image based. And then clip just says like, okay, among these images, which one matches the instruction the user provided, and that basically does the job. It&#8217;s kind of nice that now progress on visual language models, which can take place entirely outside of robotics, would basically lead to better and better language front ends for purely visual goal condition systems.</p><p>[00:32:41] <strong>Kanjun Qiu:</strong> That&#8217;s interesting. How far do you feel like visual goal condition systems can go especially with imagination?</p><p>[00:32:48] <strong>Sergey Levine:</strong> I think they can go pretty far actually. And I think that the important thing there though is to kind of think about it the right way. Like I think we shouldn&#8217;t take the whole matching pixels thing a little too literally. It&#8217;s really more like the robot&#8217;s goal&#8211;there&#8217;s kind of a funny version of this that actually came up in a project on robotic navigation that Dhruv and I were doing where we had data of robots driving around at different times of day and there&#8217;s almost like a philosophical problem. You give it a picture of a building at night and it&#8217;s currently during the day, so what should do, should like drive to the building and then wait until it&#8217;s night or should it, like, you know, wait around until it gets dark because that&#8217;s closer. So you kind of have to be able to learn representations that abstract away all of these kind of non-functional things. But if you&#8217;re reaching your goal in a reasonable representation space, then it actually does make sense. And fortunately, with deep learning, there are a lot of ways to learn good representation. So as long as we don&#8217;t take the business thing too literally, and we use appropriate representation, learning methods it&#8217;s actually a fairly solid approach.</p><p>[00:33:46] <strong>Kanjun Qiu:</strong> Right. That makes sense. And that&#8217;s not actually a really interesting question. Kind of if you give a picture of a building at night and it&#8217;s daytime, it doesn&#8217;t matter in some situations, but in other situations it really does matter. It really depends on kind of what the higher level goal is but it doesn&#8217;t have that concept of higher level goal yet.</p><p>[00:34:00] <strong>Sergey Levine:</strong> Yeah. So in reinforcement learning, people have thought about these problems a bit. So from a very technical standpoint, goal condition policies do not represent all possible tasks that an agent could perform. But the set of state distributions does define the set of all possible outcomes. So if you can somehow lift it up from just conditioning on a single goal state to conditioning on a distribution over states, then that provably allows you to represent all tasks that could possibly be done. There are different ways that people have approached this problem that are very interesting. They&#8217;ve approached it from the standpoint of these things called successor features, which are based on successor representations, you can roughly think of these as low dimensional projections of state distributions. More recently there&#8217;s some really interesting work that I&#8217;ve seen out of FAIR. This is by a fellow named Ahmed Ander two researchers at Meta. They&#8217;re developing techniques for unsupervised acquisition of these kind of feature spaces where you can project state representations and get policies that are sort of conditional on any possible task. So there&#8217;s a lot of active research in this area. It&#8217;s something I&#8217;m really interested in. I think it&#8217;s possible to kind of take these goal condition things a little further and really conditional on any notion of a task.</p><p>[00:35:11] <strong>Josh Albrecht:</strong> When you&#8217;re thinking about what directions to pursue and especially given you know, the number of people that you collaborate and the number of students and things like that, like how do you think about picking which research questions to answer and how has that evolved over the years?</p><p>[00:35:25] <strong>Sergey Levine:</strong> There are a couple of things I could say here. Obviously the right way to pick research questions really depends a lot on one&#8217;s research values and what they want out of their research. But for me, I think that something that serves as a really good compass is to think about some very distant end-goal that I would really like to see. Like generally capable robotic systems, generally capable AI systems&#8211;AI systems that could do anything that humans can do. Then when thinking about research questions, I ask myself, &#8220;If a research project that I do is wildly successful, the most optimistic sort of upper confidence bound estimate of success, will it make substative progress towards this very distant end goal?&#8221; You really want to be optimistic when making that gauge, because obviously the expected outcome of any research project is failure. Like, you know, research is failure. That&#8217;s kind of the truth of it. But if the most optimistic outcome for your research project is not making progress on your long-term goals, then something is wrong. So I always make sure to look at whether the most optimistic guess at the outcome makes substantial progress towards the most distant and most ambitious goal that I have in mind.</p><p>[00:36:34] <strong>Kanjun Qiu:</strong> Has your distant end goal changed over time?</p><p>[00:36:36] <strong>Sergey Levine:</strong> In a huge way. But I think it&#8217;s easy to have a goal that doesn&#8217;t change much over time if it&#8217;s distant enough and big enough.</p><p>[00:36:44] <strong>Kanjun Qiu:</strong> That&#8217;s right.</p><p>[00:36:45] <strong>Sergey Levine:</strong> So if you&#8217;re end goal is something as broad as like, I just want generally capable AI systems that can do anything a person can do, it&#8217;s&#8230; I mean that may be a very far away target to hit, but it&#8217;s also such a big target to hit that it&#8217;s probably gonna be reasonably conservative over time.</p><p>[00:36:58] <strong>Kanjun Qiu:</strong> That&#8217;s right. That makes sense. And that&#8217;s yours, is to make general purpose.</p><p>[00:37:01] <strong>Sergey Levine:</strong> Yeah.</p><p>[00:37:02] <strong>Kanjun Qiu:</strong> What do you feel like are the most interesting questions to you right now?</p><p>[00:37:05] <strong>Sergey Levine:</strong> One thing I maybe that I can mention here is that I think that especially over the last one or two years, there has been a lot of a advances in machine learning systems, both in robotics and in other areas like vision and language, that do a really good job of emulating people through imitational learning, through supervised learning. That&#8217;s what language models do essentially, right? They&#8217;re trained to imitate huge amounts of human produced data. Imitational learning and robotics have been tremendously successful, but I think that ultimately we really need machine learning systems that do a good job of going beyond the best that people can do.</p><p>[00:37:43] <strong>Sergey Levine:</strong> That&#8217;s really the promise of reinforcement learning. If we were to chart the course of this kind of research, it was something like, well, about five years back when there was a lot of excitement about reinforcement learning things like AlphaGo&#8211;a really exciting prospect there was that emergent capabilities from these algorithms could lead to machines that are superhuman, that are significantly more capable than people at certain tasks. But it turned out that it was very difficult to make that recipe by itself scale because a lot of the most capable RL systems relied on a really strong way on simulation.</p><p>[00:38:18] <strong>Sergey Levine:</strong> So in the last few years, a lot of the major advances have taken a step back from that and instead focused on ways to bring in even more data, which is great because that leads to to really good generalization. But when using purely supervised methods with that, you get at best at emulation of human behavior, which in some cases, like with language models, it&#8217;s tremendously powerful because if you have the equivalent or even a loose approximation of human behavior for like typing text, that&#8217;s, tremendously useful.</p><p>[00:38:45] <strong>Sergey Levine:</strong> But I do think that we need to figure out how to take these advances and combine them with reinforced learning methods because that&#8217;s the only way that we&#8217;ll get to above human behavior to actually have emergent behavior that improves on the typical human. I think that&#8217;s actually where there&#8217;s a major open question on how to combine, not the simulation base, but the data-driven approach with reinforcement learning in a very effective way.</p><p>[00:39:09] <strong>Kanjun Qiu:</strong> Hmm. That&#8217;s interesting. Do you feel like you have any thoughts on how to do that combination?</p><p>[00:39:14] <strong>Sergey Levine:</strong> In my group at Berkeley, we&#8217;ve been focusing a lot on what we call offline reinforcement learning algorithms. And the idea is that traditionally reinforcement learning is thought of as a very online and interactive learning regime, right? So if you open up the classic Sutton and Barto textbook, most canonical diagram that everyone remembers is the cycle where the agent interacts with the environment and then produce an action. The environment produce some state and it all goes in a loop. It&#8217;s a very online, interactive picture of the world. But the most successful large scale machine learning systems, language models, giant ConvNets, et cetera, they&#8217;re all trained on data sets that have been collected and that are stored to disk and then reused repeatedly.</p><p>[00:39:56] <strong>Sergey Levine:</strong> Because if you&#8217;re going to train on billions and billions of images or billions of documents of text, you don&#8217;t wanna recollect those intract each time you retrain your system. So the idea in offline reinforcement learning is to take a large dataset like that and extract a policy by analyzing the dataset not by interacting directly with a simulator or physical process. You could have some fine tuning afterwards, a little bit of interaction, but the bulk of your understanding of the world should come from a static dataset because that&#8217;s much more scalable. That&#8217;s the premise behind offline reinforcement learning. We&#8217;ve actually come a long way in developing algorithms that are effective for this. So when we started on this research in 2019, it was basically like nothing worked, you would take algorithms that worked great for online RL and in the offline regime, they just didn&#8217;t do anything, whereas now we actually like pretty respect algorithms for doing this, and we&#8217;re starting to apply them including to training of language models.</p><p>[00:40:47] <strong>Sergey Levine:</strong> We had a paper called Implicit Language Q Learning on this earlier this year, as well as pre-training large models for robotic control. That stuff is really just starting to work now, and I think that&#8217;s one of the things that we&#8217;ll see a lot of progress on very imminently.</p><p>[00:40:59] <strong>Kanjun Qiu:</strong> That&#8217;s interesting. When you first started working on offline RL, what were the problems that you felt like needed to be solved in order to get offline RL to work at all?</p><p>[00:41:06] <strong>Sergey Levine:</strong> So the basic problem with offline RL, which people, well, I can step back a little bit&#8211;in the past, people thought that offline RL really wasn&#8217;t that different from kind of traditional value-based methods like Q Learning, and you just needed to kind of come up with appropriate objectives and representations and then, you know, whatever you do to fit Q functions from online interaction, maybe you could just do the same thing with static data and that would kind of work. It actually did work in the olden days when everyone was using linear functional approximators because linear functional approximators are fairly low dimensional and you can run them on offline data and they kind of do more or less the same thing that they would do with online data, which is not much to be honest. But then with deep neural nets, when you run them with offline data, you get a problem because deep nets do a really good job of fitting to the distribution they&#8217;re trained on, and the trouble is that if you&#8217;re doing offline RL, the whole point is to change your policy.</p><p>[00:42:01] <strong>Sergey Levine:</strong> And when you change your policy, then the distribution that you will see when you run that policy is different from the one you trained on, and because neural nets are so good at fitting to the training distribution, that strength becomes a weakness when the distribution changes. It turns out this is something that people only started realizing a couple years back, but now is a very widely accepted notion that this distributional shift is a very fundamental challenge in offline reinforcement learning. And it really deeply connects to counterfactual inference. Reinforcement learning is really about counterfactual. It&#8217;s about saying, well, I saw you do this and that was the outcome, and I saw you do that, and that was the outcome. What if you did something different? Would the outcome be better or worse?</p><p>[00:42:38] <strong>Sergey Levine:</strong> That&#8217;s the basic question that reinforced learning asks. And that is a counterfactual question. And with counterfactual questions, you have to be very careful because some questions you simply cannot answer. So if you&#8217;ve only seen cars driving on a road and you&#8217;ve never seen them swerve off the road and go into the ditch, you actually can&#8217;t answer the question: what would happen if you go into the ditch? The data simply is not enough to tell you. So in offline RL the correct answer then is don&#8217;t do it because you don&#8217;t know what will happen. Avoid the distributional shift for which there&#8217;s no way for you to produce a reasonable answer. But at the same time, you still have to permit the model to generalize. You have, you know, if, if there&#8217;s something new that you can do that is sufficiently in distribution that you do believe you can produce an accurate estimate of the outcome, then you should do that because you need generalization to improve over the behavior that you saw in the dataset, and that&#8217;s a very delicate balance to strike.</p><p>[00:43:26] <strong>Josh Albrecht:</strong> Is there a principled answer to that, or is it just a sort of like heuristic, ah, we just pick something in the middle and it kind of works sometimes.</p><p>[00:43:34] <strong>Sergey Levine:</strong> There are multiple principled answers, but one answer that seems pretty simple and seems to work very well for us, this was it was developed in, a few different concurrent papers, but in terms of the algorithms that people tend to use today, probably one of the most widely used formulations, it was in a paper called Conservative Q Learning by other Aviral Kumar, one of my students here.</p><p>[00:43:54] <strong>Sergey Levine:</strong> The answer was, well, be pessimistic. So essentially, if you, are evaluating the value of some action and that action looks a little bit unfamiliar, give it a lower value, then your network thinks it has, and the more unfamiliar it is, the lower the value you should give it. And if you&#8217;re pessimistic in just the right way, that pessimism will cancel out any erroneous overestimation that you would get from mistakes in your neural network. That actually tends to work. It&#8217;s simple, it doesn&#8217;t require very sophisticated uncertainty estimation. It essentially harnesses the network&#8217;s own generalization abilities because this uh, pessimism, it affects the labels for the network and then the network will generalize from those labels.</p><p>[00:44:36] <strong>Sergey Levine:</strong> So in a sense, the degree to which it penalizes unfamiliar actions is very closely linked to how it&#8217;s generalizing. So that actually allows it to still make use of generalization while avoiding the really weird stuff that it should just not do.</p><p>[00:44:48] <strong>Kanjun Qiu:</strong> That&#8217;s interesting.</p><p>[00:44:48] <strong>Josh Albrecht:</strong> So then in offline, l thinking about techniques for going forward, do you feel like there&#8217;s a lot left to be done in offline l or are we sort of at the point where like, we have decent techniques, we&#8217;re learning a lot from these data sets that we have and we sort of need something else to move forward and, and actually make systems that are significantly better than what&#8217;s in the data already.</p><p>[00:45:10] <strong>Sergey Levine:</strong> Yeah. Yeah. I think we&#8217;ve made a lot of progress on offline RL. I do think there are major challenges still to address. And I would say that these major challenges fall into two broad categories. So the first category has to do with something that&#8217;s not really unique to offline RL. actually, like it&#8217;s a problem for all RL methods, and that has to do with their stability and scalability.</p><p>[00:45:32] <strong>Sergey Levine:</strong> So, RL methods, not just offline RL, all of them are harder to use than supervised learning methods, and a big part of why they&#8217;re harder to use, is that, for example, with value-based methods like q learning, they are not actually equivalent to gradient descent. So gradient descent is really easy to do. Gradient descent plus back prop, supervised learning, you know, cross entropy loss.</p><p>[00:45:52] <strong>Sergey Levine:</strong> Great. Like fair to say that that&#8217;s kind of at a point where it&#8217;s a turnkey thing, you code it up by torch jacks. It works wonderful. Value-based RL is not gradient descent. It&#8217;s fixed point iteration disguise is gradient descent because of that, a lot of the nice things that make gradient descent so simple and easy to use, start going a little awry when you&#8217;re doing q learning or value iteration type methods. We&#8217;ve actually made some progress in understanding this. There&#8217;s work on this in my group, there&#8217;s work on this in several other groups including for example Shimon Whiteson&#8217;s group at Oxford, many others that just recently we&#8217;ve sort of started to scratch the surface for what is it that really goes wrong when you use Q learning style methods, these fixed point interation methods rather than gradient descent. And the answer seems to be, and this is, kind of preliminary, but the answer seems to be that some of the things that make supervised deep learning so easy actually make RL hard. So l let me unpack this a little bit.</p><p>[00:46:49] <strong>Sergey Levine:</strong> If you told somebody who&#8217;s like a machine learning theorist in let&#8217;s say early 2000s that you&#8217;re going to train a neural net with like a billion parameters with gradient descent for like image recognition, they would probably tell you, well, yeah, that&#8217;s really dumb because you&#8217;re going to overfit and it&#8217;s gonna suck. So like, why are you even doing this? Based on the theory at that time, it would&#8217;ve been completely right. The surprising thing that happens is that when we train with supervised learning, with gradient descent there&#8217;s some kind of magical, mysterious fairy that comes in and applies some magic regularization that makes it not overfit and in machine learning theory, one of the really active areas of research is been to understand like, who is that fairy, what is the magic, and how does that work out? And there are a number of hypotheses that have been put forward. that are pretty interesting that all have to do with some kind of regularizing effect that basically makes it so this giant or parametrized neural net actually somehow comes up with a simple solution rather than an overly complex one. This is sometimes referred as implicit regularization&#8211;implicit in the sense that it emerges implicitly from the interplay of deep nets and stochastic gradient descent and it&#8217;s really good. Like that&#8217;s kind of what saves our bacon when we use these giant networks. And it seems to be that for reinforcement learning, because it&#8217;s not exactly great in gradient descent that implicit regularization effect actually sometimes doesn&#8217;t play in our favor.</p><p>[00:48:07] <strong>Sergey Levine:</strong> Like sometimes it&#8217;s not actually a fairy, it&#8217;s like an evil demon that comes in and like screws up your network. and that&#8217;s really worrying, right? Because like we have this like, mysterious thing that seems to have been like really helping us for supervised learning, and now suddenly we&#8217;re doing RL it comes in and hurts us instead. And at least to a degree, that seems to be part of what&#8217;s happening. So now that there&#8217;s a slightly better understanding of that question, and I don&#8217;t wanna overclaim how good our understanding of that is because there&#8217;s like major holes in that. So there&#8217;s a lot to do there. But at least we have an inkling. We have a, a suspect, so to speak, even if we can&#8217;t prove that they did it. We can start trying to solve the problem. We can try, for example, inserting explicit regularization methods that could counteract some of the ill effects of the no longer helpful implicit regularization.</p><p>[00:48:45] <strong>Sergey Levine:</strong> We can start designing architectures that are maybe more resilient to these kinds of effects. So that&#8217;s something that&#8217;s happening now, and it&#8217;s not by any means like a solved thing, but that&#8217;s where we could look for potential solutions to these kind of instability issues that seem to afflict reinforcement.</p><p>[00:49:00] <strong>Kanjun Qiu:</strong> What&#8217;s the intuition behind why implicit regularization seems to help in supervised networks, but be harmful in RL?</p><p>[00:49:07] <strong>Sergey Levine:</strong> The intuition is roughly that given a wide range of possible solutions, a wide range of different assignments to the weights of a neural net, you would select the one that is simpler, that results in the simpler function. So there are many possible values of neural net weights that would all give you a low training loss, but many of them are bad because they overfit and implicit regularization leads to selecting those assignments to the weights that result in simpler functions and yet still fit your training data and therefore generalized better.</p><p>[00:49:35] <strong>Kanjun Qiu:</strong> and so the intuition for RL is okay, for whatever reason, implicit regularization results in learning simpler functions, but actually those simpler functions are worse in an RL regime.</p><p>[00:49:47] <strong>Sergey Levine:</strong> Yeah, so in RL, it seems like you get one of two things. You either get that whole thing kind of fails entirely and you get really, really complicated functions, and roughly speaking, that&#8217;s like overfitting to your target values. Basically, your target values are incorrect when in the early stages.</p><p>[00:50:00] <strong>Sergey Levine:</strong> So you overfit to them and you get some crazy function. Essentially you get like a little bit of noise in your value estimates, and that noise exacerbated more and more and more until all you&#8217;ve got is noise or on the other hand, the other thing that seems to sometimes happen and experimentally this actually seems fairly common, is that this thing goes into overdrive and you actually discard too much of the detail and then you get an overly simple function.</p><p>[00:50:19] <strong>Sergey Levine:</strong> But somehow it seems hard to hit that sweet spot. The kind of sweet spot that you hit every time with supervised learning seems annoyingly hard to hit with reinforcement learning.</p><p>[00:50:27] <strong>Kanjun Qiu:</strong> That&#8217;s interesting. How much does data diversity help? Like if you were to add a lot more offline data of various types, does that seem to do anything to this problem or not really?</p><p>[00:50:39] <strong>Sergey Levine:</strong> We actually have a recent study on this. This was done by some of my students together actually in collaboration with Google on large-scale offline RL actually for Atari games, and there we study what happens when you have lots of data and also large networks. It seems like the conclusion that we reached is actually that if you&#8217;re careful in your choice of architecture, basically select architectures that are very easy to optimize, like ResNets, for example, and you use larger models and you think would be appropriate, larger than what you would need even for supervised learning.</p><p>[00:51:09] <strong>Sergey Levine:</strong> Then things actually seem to work out a lot better. And in that paper, kind of our takeaway, was that actually a lot of reasons why large scale RL efforts were so difficult before is that people were sort of applying their supervised learning intuition and selecting architectures according to that, when in fact if you go like, somewhat larger than that, maybe two times larger in terms of architecture size, that actually seems to mitigate some of the issues.</p><p>[00:51:34] <strong>Sergey Levine:</strong> It probably doesn&#8217;t fully solve them, but it does make things a lot easier. It&#8217;s not clear why that&#8217;s true, but one guess might be that when you&#8217;re doing reinforcement learning, you don&#8217;t just need to represent the final solution at the end. You don&#8217;t just need to represent the the optimal solution. You also need to represent everything in between. You need to represent all those suboptimal behaviors on the way there, and those suboptimal behaviors might be a lot more complicated like the final optimal behavior might be hard to find, but it might be actually a fairly simple parsimonious behavior.</p><p>[00:51:59] <strong>Sergey Levine:</strong> The suboptimal things where you&#8217;re like kind of, okay, here, kind of okay there maybe kind of optimal over there. Those might actually be more complicated and you might require more representational capacity to go on that journey and ultimately reach the optimal solution.</p><p>[00:52:11] <strong>Kanjun Qiu:</strong> It&#8217;s really interesting that in RL you need to do this counterfactual reasoning pretty explicitly. And so you&#8217;d need to represent these suboptimal behaviors, but in, let&#8217;s say a language model, you don&#8217;t need to, they&#8217;re often quite bad as a counterfactual reasoning, and we do see that they get better at that as they get larger. So there&#8217;s something interesting here.</p><p>[00:52:29] <strong>Sergey Levine:</strong> Yeah, absolutely. And actually trying to improve language models through reinforcement learning, particularly value-based reinforcement learning, is something that my students and I are doing quite a bit of work on these days. So obviously many of your listeners are probably familiar with the success of RL with human preferences in recent language models work. But one of the ways in which that falls short is that a lot of the ways that people do RL with language models now treats the language models task as a one step problem. So it&#8217;s just supposed to generate like one response and that response should get the maximal reward. But if we&#8217;re thinking about counterfactuals, that is typically situated a multi-step process. So maybe I would like to help you debug some kind of technical problem. Like maybe you&#8217;re having trouble reinstalling your graphics driver. Maybe I might ask you a question like, well, what kind of operating system do you have? Have you tried running this diagnostic. Now, in order to learn how to ask those questions appropriately, the system needs to understand that if it has some piece of information, then it can produce the right answer. And if it asks the question that can get that piece of information, it&#8217;s a multi-step process.</p><p>[00:53:36] <strong>Sergey Levine:</strong> And if it has suboptimal data of humans that were doing this task, maybe not so well, then it needs to do this counterfactual reasoning to figure out what is the most optimal questions to ask and so on. And that&#8217;s stuff that you&#8217;re not going to get with these kind of one step human preferences formulations. And certainly it&#8217;s not what you&#8217;re going to get with regular supervised learning formulations, which will simply copy the behavior of the typical human. So I think there&#8217;s actually a lot of potential to get much more powerful language models with appropriate value-based reinforcement learning, the kind of reinforcement learning that we do in robotics and other RL applications.</p><p>[00:54:06] <strong>Josh Albrecht:</strong> Digging into that a little bit, like how does that work tactically for you and for students at your lab, given that the larger you make these language models are, the more capable they are and you know, it&#8217;s kind of hard to run even inference for these things on the kind of compute that&#8217;s usually available at an academic institution. I mean, you guys have a decent amount of compute for universities, but still not quite the same as say, Google or OpenAI.</p><p>[00:54:27] <strong>Sergey Levine:</strong> Yeah, it&#8217;s certainly not easy, but I think it&#8217;s entirely possible to take that problem and subdivide it into its constituent parts so that if we&#8217;re developing an algorithm that is supposed to enable reinforcement learning with language models, well, that can be done with a smaller model evaluating the algorithm appropriately to just make sure that it&#8217;s like doing what it&#8217;s supposed to be doing. And that&#8217;s a separate piece of work from the question of how can it be scaled up to the largest size to really see how far it could be pushed. So subdividing the problem appropriately can make this quite feasible, and I don&#8217;t think that&#8217;s actually something that is uniquely demanded in academia.</p><p>[00:55:00] <strong>Sergey Levine:</strong> Like even if you work for a large company, even if you have all the TPUs and GPUs that you could wish for at your fingertips, which by the way, researchers at large companies don&#8217;t always have even then it&#8217;s a good idea to chop up your problem into parts because you don&#8217;t wanna be waiting three weeks just to see that you implemented something incorrectly in your algorithm.</p><p>[00:55:18] <strong>Sergey Levine:</strong> So in some ways it&#8217;s not actually that different, just that there&#8217;s that last stage of really fully scaling it up. But, you know, I mean, I think for graduate students that wanna finish their PhD, in many cases, they&#8217;re happy to leave that to somebody who is more engineering focused to get that last mile anyway. So as long as we have good ways to vet things, good benchmarks and good research practices, we can make a lot of progress on this stuff.</p><p>[00:55:39] <strong>Josh Albrecht:</strong> Mm-hmm. Is there any worry that emergent behaviors that you see at much larger scales would kind of cause you to make the wrong conclusion at a larger scale with some of these experiments?</p><p>[00:55:48] <strong>Sergey Levine:</strong> Yes, that&#8217;s definitely a really important thing to keep in mind. So, I think that it is important to have a loop, not just a one-directional pipeline. But, there&#8217;s a middle ground to this. So we have to kind of hit that middle ground. We don&#8217;t wanna be entirely&#8211;we don&#8217;t wanna commit the same sin that all too often people committed in the olden days of reinforcement learning research where we do things at too small of a scale to see the truth, so to speak. But at the same time, we wanna do it at a small enough scale that we can make progress, get some kind of turnaround maybe find the right collaborators in an industrial setting once we do get something working so that we can work together to scale it up and, complete the life cycle that way.</p><p>[00:56:24] <strong>Josh Albrecht:</strong> Yeah. Yeah. Actually, that brings me back to another question I was going to ask earlier. When you were talking about, the examination of, performance on Atari games as you made the models just much larger, like it does seem like in reinforcement learning, the models are much, much smaller than they are in many other parts of machine learning. Do you have any sense for exactly why that is it, is it just historical? Is it merely a performance thing? Like it just seems like, you know, I see a lot of like three layer continents or something, like, not even a ResNet, or like two layer MLP or something that&#8217;s just much, much simpler and, very small dimensions.</p><p>[00:56:57] <strong>Sergey Levine:</strong> Well, that has to do with the problems that people are working on. So it&#8217;s quite reasonable to say that if your images are attire, game images, it&#8217;s a reasonable guess that the visual representations that you would need for that are less complex than what you would need for realistic images and when you start attacking more realistic problems, more or less exactly what you expect happens that the, more modern architectures do become tremendously useful as the problem becomes more realistic. Certainly in our robotics work the kind of architectures we use generally are much closer to the latest architectures in compute production.</p><p>[00:57:28] <strong>Josh Albrecht:</strong> Mm-hmm. So it&#8217;s really just with relation to the problem, like as you get closer to the real world, the more the larger networks start to pay off quite a bit. Although, I guess the interesting thing about the Atari thing was like, as you made these larger, they seem to help anyway. Right?</p><p>[00:57:42] <strong>Sergey Levine:</strong> Yes, so that was kind of the surprising thing, is that certainly in robotics, this was not news that, you know, in robotics people, us and many others have used larger models, and yes, it was helping, but the fact that for these Atari games where if you just wanted to, let&#8217;s say, imitate good behavior, you get away with a very small network. Learning that good behavior with offline value-based reinforcement learning really benefited from the larger networks. And it seems to have more to do with kind of optimization benefits rather than just being able to represent the final answer.</p><p>[00:58:13] <strong>Kanjun Qiu:</strong> In terms of the goal of getting to more general intelligence, some people, they feel, if we just keep scaling up language models and adding things onto them doing, you know, multi-step human preferences, formulations, and finding some way to spend compute at inference so that it can do reasoning, then we&#8217;ll be able to get all the way with just, these language-based formulations. What are your thoughts on that? And kind of like the importance of robotics versus not.</p><p>[00:58:39] <strong>Sergey Levine:</strong> There are a couple of things that I could say on this topic. So first, let&#8217;s just keep the discussion just just to language models to start with. So let&#8217;s say that we believe that doing all the language tasks somebody would want to do is, is kind of, that&#8217;s good enough and that&#8217;s fine. Like, and there&#8217;s all you can do that way. Is it sufficient to simply build larger language models? I think that the answer there, in my opinion, would be no. Because there are really two things that you need the ability to learn patterns and data, and you need the ability to plan. Now, plan is a very loaded word and I use that term in the same sense that, for example, like Rich Sutton would use it, where plan really refers to some kind of computational process that determines a course of action, it doesn&#8217;t necessarily need to be literally like, you think of individual steps in a plan. It could be reinforcement. Reinforcement is a kind of amortized planning. But there&#8217;s some kind of some process that you need where you&#8217;re actually reflecting on the patterns you learned through some sort of optimization to find good actions rather than merely average actions. And that could be done at training time.</p><p>[00:59:38] <strong>Sergey Levine:</strong> So that could be like the value-based RL. It could also be done at test time. It could simply be that all you learn from your data is a predictive language model, but then a test time instead of simply doing the maximum posterior decoding, instead of simply finding the most likely answer, you actually do some kind of optimization to find an answer that actually leads to an outcome that you wanna see.</p><p>[00:59:55] <strong>Sergey Levine:</strong> So maybe I&#8217;m trying to debug your graphics driver problem. And what I want is, I want you to say at the end, &#8220;thank you so much, you did a good job, you fixed my graphics driver.&#8221; So I might ask the model, well, what could I say now that would maximize the probability that&#8217;ll actually fix your graphics driver? And if the model can answer that question, maybe some kinda optimization procedure can answer that question. That&#8217;s planning. Planning could also mean just running Q learning. That&#8217;s fine too. So whatever thing it is, that&#8217;s actually very important. And I will say something here that a lot of people, when they appeal to the possibility that you can simply build larger and larger models, they often reference Rich Sutton&#8217;s Bitter Lesson essay.</p><p>[01:00:30] <strong>Sergey Levine:</strong> It&#8217;s a great essay. I would actually strongly recommend to everybody to read it, but to actually read it because he doesn&#8217;t say that you should use big models in lots of data. He says You should use learning and planning. That&#8217;s very, very important because learning is what gets you the patterns and planning is what gets you to be better than the average thing in those patterns.</p><p>[01:00:51] <strong>Kanjun Qiu:</strong> Yeah.</p><p>[01:00:52] <strong>Sergey Levine:</strong> So, that&#8217;s, we need the planning.</p><p>[01:00:54] <strong>Josh Albrecht:</strong> Yeah. Yeah. I&#8217;ve been telling people to actually read the&#8211;</p><p>[01:00:51] <strong>Kanjun Qiu:</strong> This is also Josh&#8217;s takeaway.</p><p>[01:01:02] <strong>Josh Albrecht:</strong> Yeah, yeah. But I guess, just to push back on that slightly as a devil&#8217;s advocate for a second, like it might be the case that I think, you know, some of these people saying that large language model models are saying, maybe we can get away with sort of simple types of planning in language.</p><p>[01:01:14] <strong>Josh Albrecht:</strong> So for example, chain of thought ensembling, or asking the language model, like, what would you do next? Or just sort of like, kind of heuristic, simple kind of bolted on planning in language afterwards.</p><p>[01:01:25] <strong>Sergey Levine:</strong> I think that&#8217;s a perfectly reasonable hypothesis for it&#8217;s worth. I think that the part that I might actually take issue is that that&#8217;s actually an easier way to do it. I think it might actually be more complex. It&#8217;s just ultimately what we want is something that is&#8211;we want simplicity, because simplicity makes it easy to make things work at a large scale. Like, you know, if, if your method is simple, there&#8217;s essentially fewer ways that it could go wrong. So I don&#8217;t think the problem with clever prompting is that it&#8217;s too simple or primitive. I think the problem might actually be that it might be too complex and that developing a good, effective reinforcement learning or planning method might actually be a simpler or more general solution.</p><p>[01:02:03] <strong>Josh Albrecht:</strong> What do you think of, other types of reinforcement learning setups? Like, I&#8217;m not sure if you saw the work by Anthropic maybe earlier this week or very recently. Basically, instead of doing RL with human feedback, they propose doing RL with AI feedback. It&#8217;s like, oh, okay, we&#8217;ll train this other preference model and then sort of use that to do the feedback loop as a way of sort of automating this and getting human outta the loop as maybe an alternative to offline RL.</p><p>[01:02:29] <strong>Sergey Levine:</strong> Yeah. I like that work very much. I think that the part I might suddenly disagree with that is I don&#8217;t think it&#8217;s an alternative to offline RL. I think it&#8217;s actually a very clever way to do offline RK. I like that line of work very much because I think it gets at a similar goal of trying to essentially do planning an optimization procedure at training time using what is in effect a model the language model is being used as a model. And that&#8217;s great because then you can get emergent behavior. And I think it&#8217;s actually, in my mind, it&#8217;s actually more interesting than leveraging human feedback. Because with human feedback you&#8217;re essentially relying on human teachers to like hammer this into you which is pragmatic. Like if you wanna, build a company and you really want things to work today, like yeah, it&#8217;s great to leverage humans because you can hire lots of humans and get them to hammer your model until it does what you want.</p><p>[01:03:10] <strong>Sergey Levine:</strong> But the prospect of having an autonomous improvement procedure, you know, that&#8217;s essentially the dream of reinforcement learning. Like an autonomous improvement procedure where the more compute you throw at it, the better it gets. So yeah, I read that paper. I think it&#8217;s great. in terms of technical details, I think a multi-step decision making process would be better than a single step decision pro making process. But I think a lot of the ideas in terms of leveraging the language models themselves to facilitate that improvement are great. And I think that is, actually in a reinforcement learning algorithm, an offline reinforcement learning algorithm in disguise, actually very thin disguise.</p><p>[01:03:39] <strong>Kanjun Qiu:</strong> These language models aside from what we talked about earlier with translating, images into language, can we use the embeddings that are learned or anything like that for robotics type problems?</p><p>[01:03:54] <strong>Sergey Levine:</strong> Yeah, so I think that perhaps one of the most immediate things that we get out of that is a kind of human front end in effect where we can build robotic systems that understand visual-motor control, basically how to manipulate the world and how to change things in the environment. We can build those kinds of systems and then we can hook them up to an interface that humans can talk to by using the, these visual language models.</p><p>[01:04:15] <strong>Sergey Levine:</strong> So that&#8217;s kinda the most obvious, most immediate application. I do think that a really interesting potential is for it to not simply be a front end, but to actually have it be a bidirectional thing where potentially these things can also take knowledge contained in language models and import it into robotic behavior. Because one of the things that language models are very good at is acting as almost like really, really fancy like relational databases, like kind of stuff that AI people were doing in, the eighties and nineties where you come up with a bunch of logical propositions and you can say, well, like is a true? And you look up some facts and you figure out, you know, A is like B, et cetera. Language models are great at essentially doing that. So if you want the robot to figure out like, oh, I&#8217;m in this building, where do I go if I wanna get a glass of milk, it&#8217;s like, well, the milk is probably in the fridge. The fridge is probably in the kitchen. The kitchen is probably down the hallway in the open area because kitchens tend to be near a break area. It&#8217;s an office building. Like all this kind of factual stuff about the world, you can probably get a language mal just tell you that. And if you have a vision language model that acts as an interface between the symbolic linguistic world and the physical world, then you can, import that knowledge into your robot, essentially, and now for all this factual stuff, it&#8217;ll kind of take care of it.</p><p>[01:05:25] <strong>Kanjun Qiu:</strong> Mm-hmm.</p><p>[01:05:27] <strong>Sergey Levine:</strong> It won&#8217;t take care of all the low level stuff. It won&#8217;t tell the robot how to like move its fingers. The robot still needs us to do that. But it does a great job of taking care of these kind of factual semantic things.</p><p>[01:05:36] <strong>Kanjun Qiu:</strong> Right, right. Mm-hmm. And there&#8217;s a bunch of work using these language models for higher-level planning and then telling the instructions to the robot. What do you think abou the approach of collecting a lot of robotic data sets and then making a much larger model and then training on this diversity of data sets to get kind of &#8220;simulate&#8221; the generality of something that you would get from one of these large scaled self supervised models?</p><p>[01:06:00] <strong>Sergey Levine:</strong> That&#8217;s a great direction, and I should say that my students and I have been doing a lot of work and a lot of planning on how to build general and reusable robotic control models. So far, one of our results that&#8217;s kind of closest to this is a paper by Dhruv Shah called General Navigation Models which deals with a problem of robotic navigation. And what Dhruv did is basically, he went to all of his friends who work on robotic navigation and borrowed their data sets. So we put together a data set with 8 different robots. So it&#8217;s not a huge number, it&#8217;s only 8, but they really run the gamuts all the way from small scale RC cars. So these are all mobile robots, so small-scale RC car, something that&#8217;s like 10 inches long, all the way to full scale ATVs. So these are off-road vehicles that are used for research. Like you, you can actually sit in it. So there&#8217;s a large kind of car and everything in between.</p><p>[01:06:47] <strong>Sergey Levine:</strong> I think there&#8217;s a spot mini in there. There&#8217;s a bunch of other stuff. And he trained a single model that does goal-based navigation just using data from all these robots. The model is not actually told which robot its driving. It&#8217;s given a little context, so it has a little bit of memory, and basically just by looking at that memory, you can sort of guess roughly what the properties of the robot is currently driving are, and the model will actually generalize to drive new robots. So we actually got it, for example, to flag quad rotor. Now, the quad rotor had to pretend to be a car. So the quad rotor still controlled only in two because there were no flying vehicles in the data set. But it has, you know, a totally different camera. It has this fisheye lens. Obviously it flies, so it wobbles a bit. And the model could just in zero shot immediately fly the quad rotor. In fact, we put that demo together before a deadline. So the model worked on the first try. What took us the most time is figuring out how to replace the battery in the quad rotor, because we haven&#8217;t used it for a year. Once we figured out how to replace the battery, the model could actually figure out how to fly the drone immediately. So I mean navigation obviously is simpler in some ways in robotic manipulation because you&#8217;re not making contact with the environment, at least if everything&#8217;s going well.</p><p>[01:07:52] <strong>Sergey Levine:</strong> So in that sense, it&#8217;s a simpler problem, but it does seem like multi-robot generalization there was very effective for us. And we&#8217;re certainly exploring multi-robot generalization for manipulation. Right now, we&#8217;re trying to collaborate with a number of other folks that have different kinds of robots. There&#8217;s a large data collection effort from Chelsea Finn&#8217;s group at Stanford that we&#8217;re also partnering up with. So I think we&#8217;ll see a lot more of that coming in the future, and I&#8217;m really hopeful that a few years from now, the standard way that people approach robotics research will be just like envision and an LP to start with a pre-trained, multi-robot model that has basic capability and really build their stuff on top of that.</p><p>[01:08:28] <strong>Kanjun Qiu:</strong> That&#8217;s cool. That&#8217;s really interesting. In terms of thinking about the next few years, like let&#8217;s say next five years, do you have a sense of what kind of developments you&#8217;d be most excited to see that you kind of expect will happen aside from pre-trained models for robotic?</p><p>[01:08:42] <strong>Sergey Levine:</strong> Obviously the pre-trained models one is a very pragmatic thing. That&#8217;s something that&#8217;s super important. But the thing that I would really hope to see is something that makes lifelong robotic learning really the norm. I think we&#8217;ve made a lot of progress on figuring out how to do large scale limitational learning. We&#8217;ve developed good RL methods. We&#8217;ve built a lot of building blocks, but to me, the real promise of robotic learning is that you can turn on a robot, leave it alone for a month, come back and suddenly it&#8217;s like figured out something amazing that you wouldn&#8217;t have thought of yourself. And I think to get there, really need to get in the mindset of robotic learning being an autonomous, continual, and largely unattended process. If I can get to the point where I can walk into the lab, turn onto my robot and come back in a few days, and it&#8217;s actually spent the intervening time productively, I would consider that to be a really major success.</p><p>[01:09:34] <strong>Josh Albrecht:</strong> Hmm. How much of that do you think is important to focus on the actual lifetime of the individual robot? Like treating it as an individual versus like, well, it&#8217;s just like a data collector for the offline RL data set, and it just sends it up there and like gets whatever coming back down afterwards.</p><p>[01:09:49] <strong>Sergey Levine:</strong> Oh, I think that&#8217;s perfectly fine. Yeah. And I think that in reality for any practical deployment of these kinds of ideas at scale, it would actually be many robots all collecting data, sharing it, exchanging their brains over a network and all that. That&#8217;s the more scalable way to think about on the learning side. But I do think that also on the physical side, there&#8217;s a lot of practical challenges and just like, you know, what kind of methods should we even have if we want the robot in your home to practice, you know, cleaning your dishes for three days. I mean, like, if you just run a reinforcement learning algorithm for a robot in your home, probably the first thing it&#8217;ll do is wave its arm around, break your window, then break all of your dishes, then break itself, and then spend the remaining time it has just sitting there at broken corner. So there&#8217;s a lot of practicalities in this.</p><p>[01:10:32] <strong>Kanjun Qiu:</strong> That&#8217;s right. And It won&#8217;t go out and buy more dishes, which is what you&#8217;d want it to do.</p><p>[01:10:38] <strong>Josh Albrecht:</strong> No, no. I don&#8217;t think you&#8217;d want that to go outside and buy more dishes that would go outside, fall down the steps, hurt someone, get in the middle of the road and cause an accident like.</p><p>[01:10:44] <strong>Sergey Levine:</strong> In all seriousness, that&#8217;s where I think a lot of these challenges are wrapped up because in some ways, all of these difficulties that happen in the real world, they&#8217;re also opportunities. Maybe the breaking of the dish of the dishes is extreme, but if it like drops something on the ground, well great. Figure out how to pick it up off the ground. If it spills something, great, good time to figure out how to get out the sponge and clean up your spill. Like robots should be able to treat all these unexpected events that happen as new learning opportunities rather than things that just cause &#8216;em to fail.</p><p>[01:11:09] <strong>Sergey Levine:</strong> And I think that there&#8217;s a lot of interesting research wrapped up in that it&#8217;s just hard to attack that research because it always kind of falls in between different disciplines. Like it doesn&#8217;t slot neatly into just developing a better RL method or just developing a better controller or something.</p><p>[01:11:21] <strong>Kanjun Qiu:</strong> Hmm. That&#8217;s really interesting, huh? That, yeah, it&#8217;s kind of like somewhere between continual learning and robotics and some other stuff.</p><p>[01:11:31] <strong>Josh Albrecht:</strong> And it&#8217;s all about the messy deployment parts. Like the part about the quad captor taking longer to replace the battery than to train. Probably wasn&#8217;t even in the paper. It wasn&#8217;t even in the appendix.</p><p>[01:11:40] <strong>Sergey Levine:</strong> No, it wasn&#8217;t in the appendix. It might be in the undergraduate students grad school application essay,</p><p>[01:11:47] <strong>Kanjun Qiu:</strong> Right. Right. Looking into the past, whose work do you feel like has impacted you the most?</p><p>[01:11:54] <strong>Sergey Levine:</strong> That&#8217;s an interesting question. there&#8217;s some kind of like very standard answers I could give, but I actually think that one body of work that I wanna highlight that maybe not many people are familiar with, that was actually quite influential on me, is work by Emanuel Todorov. So most people know about Professor Todorov from his work on developing the MuJoCo simulator. But before he did that, he actually did a lot of research at the intersections between control theory, reinforcement learning, and neuroscience. And in many ways, the work that he did was quite ahead of its time in terms of combining reinforcement learning ideas with probabilistic inference concepts and controls.</p><p>[01:12:34] <strong>Sergey Levine:</strong> And besides, you know, at some low technical level, a lot of the ideas that I capitalized on in developing new RL algorithms were based on some of these controls, inference concepts that his work as well as the work of other people in that area pioneered. But also I think the general approach and philosophy of combining very technical ideas in protic inference RL and neuroscience and controls altogether like that was something that, I would say, really shaped my approach to research, because essentially I think one of the things that, he and others in that kind of neck of the woods did really well, is really tear down the boundaries between these things. As an example of something like this, there&#8217;s this idea that sometimes referred to as common duality, which is basically the concept that a forward backward message passing algorithm, like what you would use in a hidden Markov model is more or less the same thing as a control algorithm.</p><p>[01:13:26] <strong>Sergey Levine:</strong> So, in inferring the most likely state to get you know given a sequence of observations, kind of looks at an awful lot like inferring the optimal action given some reward function and that could be made into a mathematically precise statement.</p><p>[01:13:37] <strong>Kanjun Qiu:</strong> Mm-hmm.</p><p>[01:13:38] <strong>Sergey Levine:</strong> So it&#8217;s not merely interdisciplinary, it&#8217;s really tearing down the boundaries between these areas and showing the underlying commonality that emerges when you, basically, reason about sequential processes and I think that was actually very influential on me in terms of how I thought about the technical concepts in these area.</p><p>[01:13:55] <strong>Kanjun Qiu:</strong> That&#8217;s really interesting. It reminds me of a lot of folks are really interested in, or maybe not a lot, but a few people are very interested in formulating RL. The RL formulation as kind of a sequence model formulation. And so, it feels like there&#8217;s like maybe a similar thing going on here. I&#8217;m curious what you think about this formulation.</p><p>[01:14:12] <strong>Sergey Levine:</strong> Yeah, I think, I think to a degree that&#8217;s true. So certainly the idea that inference and sequence models looks a lot like control is a very old idea. The reason the common duality is called the common duality is because it actually did show up in common&#8217;s original papers. That&#8217;s not what most people took away from, and most people took away that it&#8217;s a good way to do state estimation.</p><p>[01:14:30] <strong>Sergey Levine:</strong> And, you know, that was in the age of a space race and people used it for state estimation, for like the Apollo program. But buried in there is the relationship between control and inference and sequence models that the same way that you would figure out what state that you&#8217;re in given a sequence of observations could be used to figure out what action to take to achieve some outcome. And yeah, it&#8217;s probably fair to say that the relationship between sequence models and control is an extremely old one. And there&#8217;s still more to be gained from that connection.</p><p>[01:14:55] <strong>Kanjun Qiu:</strong> Do you feel like you&#8217;ve read any papers or work recently that you were really surprised by?</p><p>[01:15:02] <strong>Sergey Levine:</strong> There are a few things&#8230;This is maybe a little bit tangental to what we discussed so far, but I have been a bit surprised by some of the investigations into how language models act as a few shot learners. So I worked a lot on meta learning, kind of, I would say at this point, really the previous generation of meta learning algorithms. So kind of the few shot stuff that was in 2018, 2019. But with language models, there&#8217;s a very interesting question as to the degree to which they actually act as meta learners are not. And there&#8217;s been somewhat contradictory evidence, like one way or the other.</p><p>[01:15:34] <strong>Sergey Levine:</strong> And some of that was kind of surprising to me, like, for example, you can take a few shot prompt and attach incorrect labels to it, and then the model will look at it and then start producing correct labels, which maybe kind of suggests that perhaps is not paying attention to the labels, but more to the format of the problem. But of course, all these studies are empirical and it&#8217;s always a question as to whether the next generation of models still exhibits the same behavior or not. So you kinda have to take it with a grain of salt. But I have found some of the conclusions there to be kind of surprising that maybe these things aren&#8217;t really meta learners. Rather, they&#8217;re just formats getting like format specification out of problems.</p><p>[01:16:07] <strong>Kanjun Qiu:</strong> Yeah, they&#8217;re like really, really, really good pattern matchers. Interesting. Also, as they get bigger, some people say they take less data to fine tune, and so maybe doing some kind of few shot learning during training as well.</p><p>[01:16:21] <strong>Sergey Levine:</strong> There&#8217;s an interesting tension there because you would really like, I think in the end for the ideal meta learning method to have something that can get a little bit of data for a new problem. Use that to solve that problem, but also use it to improve the model. And that&#8217;s something that&#8217;s always been a little tough with meta learning algorithms because typically the process of adapting to a new problem is very, very separate from the process of training the model itself. Certainly that&#8217;s true in the classic way of using language models with prompts as well.</p><p>[01:16:45] <strong>Sergey Levine:</strong> But it&#8217;s very appealing to have a model that can fine tune on small amounts of data because then the process of adapting to a task is the same as the process of improving the model. And the model actually gets better with every task. So, you could imagine, for example, that the logical conclusion of this kind of stuff is a kind of a lifelong online metal learning procedure where every new task that you&#8217;re exposed to, you can adapt to it more quickly and you can use it to improve your model so they can adapt to the next task even more quickly. So think that&#8217;s in the, in the world of meta learning, that&#8217;s actually kind of an important open problem is how to move towards lifelong and online meta learning procedures that really do get better at both the meta and the low level. And it&#8217;s not actually obvious how to do that or whether the advent of large language models makes that easier or harder. It&#8217;s an important problem.</p><p>[01:17:27] <strong>Kanjun Qiu:</strong> What do you feel like are some underrated approaches or overlooked approaches that, you don&#8217;t see many people looking at today, or it&#8217;s not very popular, but you think it might be important?</p><p>[01:17:38] <strong>Sergey Levine:</strong> One thing that comes to mind, I don&#8217;t know how much this counts as overlooked or underrated, but I do think that it might be that to some degree model-based RL is a little bit underutilized to some degree because, well, it sort of makes sense if we&#8217;ve seen big advanced in general models, then more explicit model-based sterile techniques perhaps can do better than they do now. And it may also be that there&#8217;s room for very effective methods to be developed. That hybridize model-based to model free RL in interesting ways that can do a lot better than either one individually, perhaps by leveraging latest ideas from building very effective generative models.</p><p>[01:18:14] <strong>Sergey Levine:</strong> Just as one point about what these things could look like. Model-based reinforcement learning at its core uses some mechanism that predicts the future. But typically we think of predicting the future kind of the way we think about like movies and videos. Like you predict the world like one frame at a time but there isn&#8217;t really any reason to think about it that way. All you really need to predict is what will happen in the future if you do something that doesn&#8217;t have to be one time, step or frame at a time. It could be that you predict something that will happen at some future point. Maybe you don&#8217;t even need to know which future point in particular, like soon or not so soon, right?</p><p>[01:18:42] <strong>Sergey Levine:</strong> And it may be that this kind of more flexible way of looking at prediction could provide for models that are easier to train that maybe leverage ideas from current generative models sufficient to do control, to do decision making, but not as complicated as like full on frame by frame, pixel by pixel prediction of everything that your robot will see for the next hour.</p><p>[01:19:02] <strong>Josh Albrecht:</strong> Yeah. Why do you think that we haven&#8217;t seen more advances there on the model-based reinforcement? I mean, given the success of these large generative models. I mean, people have been making large generative models really good for more than a few years now, but we haven&#8217;t really seen, I feel like them applied in the RL setting directly.</p><p>[01:19:20] <strong>Sergey Levine:</strong> Well, there is a big challenge there. The challenge is that actually prediction is often much harder than generation. One way you can think about it is if your task is to generate, let&#8217;s say, a picture of an open door that, you can draw any door you want, you know, it can be any color as long as it&#8217;s open, but if your goal is to predict this particular door in my office, what it would look like if I were to open it, now you really have to get all the other details right.</p><p>[01:19:45] <strong>Kanjun Qiu:</strong> Mm-hmm.</p><p>[01:19:46] <strong>Sergey Levine:</strong> And you really have to get them right if you want to use that for control because you want to get the system to figure out what thing in the scene needs to actually change. So if you messed up a bunch of other parts or it&#8217;s like, it&#8217;s not open the same way that this particular door opens, that&#8217;s actually much less useful to you. So prediction can be a lot harder than generation because with just straight up generation, you, kinda have a lot of freedom to fudge a lot of the details. When you get the freedom to fudge the details, you can basically do the easiest thing you know how to do for all the stuff except for the main subject of the picture.</p><p>[01:20:12] <strong>Kanjun Qiu:</strong> Mm-hmm. Yeah. It&#8217;s like once you have to do prediction, you need consistency, you need it over long time horizons. There are like all of these other things to work on. Why do you think we still see a lot of model-based all that? Does these kind of frame by frame rollouts versus predicting a point in the future or something like that?</p><p>[01:20:31] <strong>Josh Albrecht:</strong> Or also versus predicting some aspects of the future, as you were mentioning before, right? Like maybe this thing will happen or maybe this attribute will change, or maybe I expect this particular piece of the future.</p><p>[01:20:40] <strong>Sergey Levine:</strong> Well, I do think the decomposition into a predictive model and a planning method is very clean. So it&#8217;s very tempting to say, well, we know how to run our RL against a simulator. So as long as we get a model that basically acts as a slot in replacement for a simulator, then we know exactly how to use it. So it&#8217;s a very clean attempting decomposition. And part of why I think we should think about breaking that decomposition is because this notion of a very clean decomposition, it makes me harken back to the end-to-end stuff. Like, you know, in robotics we used to have another very clean decomposition, which is the decomposition between estimation and control. It used to be that perception and control were kept very separate because it&#8217;s such a clean decomposition and maybe here too, prediction and planning are kept very separate because it&#8217;s such a clean decomposition. But just because it&#8217;s clean doesn&#8217;t mean it&#8217;s right. that&#8217;s a notion that we ought to challenge.</p><p>[01:21:24] <strong>Kanjun Qiu:</strong> I see. And it, kind of just feels like it hasn&#8217;t been challenged so extremely, so far.</p><p>[01:21:44] <strong>Josh Albrecht:</strong> One question, just going back to the importance of making robots that don&#8217;t smash all your dishes and smash all the windows and everything like that, which does seem like a very useful thing for people to be working on and does seem a little bit underserved by existing incentives. Like, do you have any ideas how to fix that? Is it like a new conference? Is it a new way of judging paper? Is it just people being open in the importance of this problem? Like how do we actually make progress on that? Besides industry&#8211;industry can certainly make progress, but in academia&#8230;</p><p>[01:21:57] <strong>Sergey Levine:</strong> It&#8217;s something that I think about a lot. I think one great way to approach that problem is to actually like set your goal to build a robot that has some kind of existence, that has some kind of life of its own. I spent part of my time hanging out with robotics at Google, the Google Brain Robotics Research Lab. And there, I think we&#8217;ve actually done a pretty good job of this where, if you walk into our office, we&#8217;ll get a Googler to escort you, obviously, like, you know, don&#8217;t break into our office. But if you walk into our office legally, you will see robots just driving around and you walk into the micro kitchen there where people will go in and get their snacks and you might be standing in line behind a robot that&#8217;s getting a snack.</p><p>[01:22:31] <strong>Sergey Levine:</strong> And people have gotten into this habit of like, well, the robotics experiment is continual, it&#8217;s ongoing and it&#8217;ll live in the world that you live in, and you better deal with it. And you deal with that as a researcher, and that actually like gets you into this mindset where things do need to be more robust and they need to be more configured in such a way that they support this continual process and they don&#8217;t break the dishes. On the technical side, there&#8217;s still a lot to do there, but just getting to that mode of thinking about the research process, I think helps a ton. And we&#8217;re starting to move in that direction here at UC Berkeley too. We&#8217;ve got our little mobile robot roving around the building on a regular basis.</p><p>[01:23:04] <strong>Sergey Levine:</strong> We&#8217;ve got our robotic arm in the corner constantly trying to pick up objects. And I think once you start doing research that way, now it becomes much more natural to be thinking about these kinds of challenges.</p><p>[01:23:13] <strong>Kanjun Qiu:</strong> It&#8217;s another example breaking down a barrier. In this case, it&#8217;s between the experimental environment and your real life environment. Do you feel like there was a work of yours that was most overlooked?</p><p>[01:23:22] <strong>Sergey Levine:</strong> I think every researcher thinks that&#8217;ll work of theirs has been most overlooked, but one thing maybe I could talk about a little bit is some work that two of my postdocs, Nick Reinhardt and Glen Berseth did recently me and a number of other collaborators. Studying intrinsic motivation from a very different perspective. So intrinsic motivation and reinforcement learning is often thought of as the problem of seeking out novelty in the apps of supervision. So people formulate in different ways, you know, find something that&#8217;s surprising, find something that your model doesn&#8217;t fit, et cetera. Nick and Glen took a very different approach to it that was inspired by it was actually inspired by some neuroscience and cognitive science work from a gentleman named Karl Friston from the UK. There&#8217;s this idea that perhaps intrinsic motivation can actually be driven by the opposite objective.</p><p>[01:24:08] <strong>Sergey Levine:</strong> The objective of minimizing surprise. And the intuition for why this might be true is that if you imagine kind of a very ecological view of intelligence, let&#8217;s say, you&#8217;re a creature in the jungle and you&#8217;re, hanging out there and you wanna survive, well, maybe you actually don&#8217;t wanna find surprising things like, you know, a tiger eating you would be very surprising and you would rather that not happen.</p><p>[01:24:25] <strong>Sergey Levine:</strong> So you&#8217;d rather like kind of find your niche, hang out there and be safe and comfortable. And that actually requires minimizing surprise. But minimizing surprise might require taking some kind of coordinated action. So you might think, well it might rain tomorrow and then I&#8217;ll get wet. And that kind of kicks me out of my comfortable niche. So maybe I&#8217;ll actually go on a little adventure and find some materials to build shelter, which, you know, that might be a very uncomfortable thing to do. It might be very surprising. But once I&#8217;ve built that shelter, now I&#8217;ll put myself in a more stable niche where I&#8217;m less likely to get surprised by something.</p><p>[01:24:54] <strong>Sergey Levine:</strong> So perhaps, paradoxically, minimizing surprise might actually lead to some behavioral that looks like curiosity or novelty seeking in service to getting yourself to be more comfortable later. It&#8217;s a very kind of strange idea in some ways, but perhaps, a really powerful one. If we want to situate agents in open world settings where we want them to explore without human supervision, but at the same time not get distracted by the million different things that could happen. Like, you know, they should explore, but they should explore in a way that kind of gets them to be more capable, sort of accumulates capabilities and things like that accumulates some ability to affect their world.</p><p>[01:25:27] <strong>Sergey Levine:</strong> So we had several papers that studied this one called SMiRL: Surprise Minimization Reinforcement Learning, and another one called IC2, which information Capture for Intrinsic Control. Both these papers looked at how minimizing novelty, either minimizing the entropy of your own beliefs, meaning manipulate the world so that you&#8217;re more certain about how the world works, or simply minimizing the entry of your state. So manipulate the world so that you occupy an narrow range of states can actually lead to immersion behavior. And this was like very experimental, preliminary half bay kind of stuff. But I think that&#8217;s maybe a direction that has some interesting implications in the future.</p><p>[01:26:05] <strong>Kanjun Qiu:</strong> That&#8217;s really interesting. That&#8217;s a very unusual formulation. What controversial or unusual research opinions do you feel like you have that other people don&#8217;t seem to agree with?</p><p>[01:26:15] <strong>Sergey Levine:</strong> I have quite a few although I say that I do tend to be open-minded and pragmatic about these things, so I&#8217;m more than happy to work with people even on projects that might invalidate some of these opinions. But some of the things that I think many people don&#8217;t entirely agree with, is, for one thing, there&#8217;s a lot of activity in robotic learning around using simulation to learn policies for real world robots. And I think that&#8217;s very pragmatic. I think if I were to like start a company today that that&#8217;s an approach that I might explore. The controversial part is I think in the long run we&#8217;re not gonna do that. And the reason that I think we&#8217;re not gonna do that in the long run is that ultimately it&#8217;ll be much easier to use data rather than simulation to enable robots to do things.</p><p>[01:26:59] <strong>Sergey Levine:</strong> And I think that&#8217;ll be true for several reasons. One of the reasons is that once we get the robots out there data is much more available and there&#8217;s a lot less reason to use simulation. So if you have, you know, if you&#8217;re in the Tesla regime, if you haven&#8217;t, you know, a million robots out there, now suddenly simulation doesn&#8217;t look as appealing because, hey, getting lots of data is easy. The other reason is that I think that the places where we&#8217;ll really want learning to attain really superhuman performance will be ones where the robot will need to figure things out in sort of in tight coupling with the world. So if we understand something well enough to simulate it really accurately, maybe that&#8217;s actually not the place where we most need learning. The other reason is that, well, if you look at other domains, if you look at like NLP or computer vision, I mean, nobody in NLP thinks about coding up a simulator to simulate how people produce language like that sounds ridiculous. Using data is the way to go. I mean, you might use like synthetic data from a language model, but you&#8217;re not gonna like write a computer program that simulates how human fingers and vocal chords work and create tech. You know, type one keyboards or, emit. Sounds like that just sounds crazy. You&#8217;d use data. In computer vision maybe there&#8217;s a little bit more simulation, but still, like using real images is just so much easier than generating synthetic images. Some people do work on synthetic images, but the data-driven paradigm is so powerful and relatively easy to use that most people just do that. And I think that we&#8217;ll get to that point in robotics too.</p><p>[01:28:13] <strong>Sergey Levine:</strong> Another one that I might say is that I think that well, this is maybe coming back to something that we discussed already, but I think there&#8217;s a lot of activity in robotics and also in other areas around using essentially imitation learning style approaches. So get humans to perform some tasks maybe robotic tasks or maybe they&#8217;re not, they&#8217;re booking flights on the internet or something. Whatever task you wanna do, get humans to generate lots of data for it, and then basically do a really good job of emulating that behavior. And again, I think this is like one of those things that I would put into the category of very pragmatic approaches that would be very good to leverage if you&#8217;re starting like a company right now.</p><p>[01:28:46] <strong>Sergey Levine:</strong> But if you want to really get general purpose, highly effective AI systems, I think we really need to go beyond that. And there&#8217;s a really cute quote that my former postdoc Glen posted this on Twitter after a recent conference. He said something like, oh, I saw a lot of papers on imitation learning. But perhaps it harkens back to an earlier quote by Rodney Brooks, that imitation learning is doomed to succeed. So Rodney Brooks had a quote years ago where he said simulation is doomed to succeed. What he meant by that is that when people do robotics research and simulation, it always works. It always succeeds, but then it&#8217;s hard to make that same thing work in the real world. And I think Glenn&#8217;s point was that with imitational learning, it&#8217;s easy to get it to work, but then you kinda like hit a wall where it&#8217;s like, it&#8217;s really good for the thing that imitational learning is good for. So it&#8217;s like, looks deceptively effective. But then if you wanna go beyond that, if you really wanna do something that people are not good at, then you just hit a wall. And I think that that&#8217;s that&#8217;s a really big deal. I think we should really be in robotics and in other areas where we want rational, intelligent decision making. Really be thinking hard about planning reinforcement, learning things that go beyond just copying humans.</p><p>[01:29:46] <strong>Kanjun Qiu:</strong> Yeah. That&#8217;s really interesting. I love this imitation is doomed to succeed.</p><p>[01:29:51] <strong>Sergey Levine:</strong> The third one, and maybe this is the last one that&#8217;s big enough to be interesting is to be honest, I&#8217;m actually very skeptical about the utility of language in the long run as a driving force for artificial intelligence. I think that language is very, very useful right now. I think there&#8217;s like a kind of a very cognitive science view of language where it says, well, people think in symbolic terms and language is sort of our expression of those symbolic concepts. And therefore language is like a fundamental substrate of thought. I think a very reasonable idea. What I&#8217;m skeptical about is the degree to which that is really a prerequisite for intelligence, because there are a lot of animals that are much more intelligent than our robots that do not possess language. They might possess some kind of symbolic, rational thought, but they certainly don&#8217;t speak to us.</p><p>[01:30:37] <strong>Sergey Levine:</strong> They certainly don&#8217;t express their thoughts and language. And because of that, my suspicion is actually that the success of things like language models has less to do with the fact that it&#8217;s language and more to do with the fact that we&#8217;ve got an internet full of language data. And that perhaps it&#8217;s really not so much about language.</p><p>[01:30:54] <strong>Sergey Levine:</strong> It&#8217;s really about the fact that there is this structured repository that happens to be written in language and that perhaps in the long run we&#8217;ll figure out how to do all the wonderful things that we do with language models, but without the language using, for example, sensory motor streams, videos, whatever. And we&#8217;ll get that generality and we&#8217;ll get that power. and it&#8217;ll come more from understanding the physical and visual concepts in the world, rather than necessarily parsing words in English or something of the like.</p><p>[01:31:20] <strong>Kanjun Qiu:</strong> Earlier, we talked about hitting walls, methods that hit walls. Do you think that the language-based method, when we think about an artificial general intelligence, would at some point hit a wall?</p><p>[01:31:30] <strong>Sergey Levine:</strong> Oh, absolutely. I do think though that we should be a little careful with that because language models hit walls, but you can build ladders over those walls using other mechanisms. Certainly, in recent robotics research including robotics research that the team that I work with at Google has done as well as many others. We&#8217;ve seen a lot of really excellent innovations from people where they use visual or visuomotor models that understand action, understand images to bridge the gap between language models, the symbolic lang world of language model and the physical world. And I think that we&#8217;ve come a long way in doing that, but I do think that purely language-based systems by themselves do have a major limitation in terms of the inability to really ground out things into the lowest level of perception and action.</p><p>[01:32:16] <strong>Sergey Levine:</strong> And that&#8217;s very problematic because actually the reason that we don&#8217;t have a lot of like text on the internet of like, oh, if you wanna throw a football, then you should fire this neuron and actuate this muscle and, so on. we don&#8217;t put that in text because it&#8217;s so easy for us. It&#8217;s so easy for us, but that doesn&#8217;t mean that it&#8217;s easy for our machines. The thing that where the gap between human capability and machine capability&#8217;s largest is exactly the thing that we&#8217;re not gonna express in language.</p><p>[01:32:40] <strong>Kanjun Qiu:</strong> Mm. So basically the way in which the internet data set is skewed is that all of the easy stuff is not on there. And so it doesn&#8217;t get that.</p><p>[01:32:49] <strong>Sergey Levine:</strong> Yeah.</p><p>[01:32:49] <strong>Kanjun Qiu:</strong> That&#8217;s interesting. What do you think about the idea that we might get an AGI that is able to solve all digital tasks on your computer? Like do everything digitally that a human can do, but we&#8217;ll still be many, many, many years away?</p><p>[01:33:02] <strong>Sergey Levine:</strong> Well, maybe there&#8217;s something comforting about that because then it can&#8217;t like go out into the world and start doing things that are too nefarious. But I think that kind of stuff is possible. In research, I do tend to be a little bit of an optimist, and I do think that we can figure out many of the nitty gritty, physical, robotic things.</p><p>[01:33:16] <strong>Sergey Levine:</strong> I&#8217;m not sure how long that&#8217;ll take exactly. But I&#8217;m also kind of hopeful that if we figure them out, we&#8217;ll actually get a better solution for some of the symbolic things. Like, you know, if your model understands how the physical world works, you can probably do a better job in the digital world because the digital world influences the physical world and a lot of the most important things there really do have a physical kind of connection. So maybe it&#8217;s actually gonna go the other way that figuring out the physical stuff will lead to better understanding of how to manipulate language.</p><p>[01:33:40] <strong>Kanjun Qiu:</strong> Yeah. Totally agree. thank you so much. This is super fun, and we really enjoy the conversation. Yeah, thanks a bunch.</p><p>[01:33:47] <strong>Sergey Levine:</strong> Yeah. Thank you very much.</p><p><em>Thanks to <a href="https://www.linkedin.com/in/tessajhall/">Tessa Hall</a> for editing the podcast.</em></p><div><hr></div><h3><strong>About Imbue</strong></h3><p><a href="https://imbue.com/">Imbue</a> is an independent research company developing a better way to build personal software. Our <a href="https://imbue.com/company/vision/">mission</a><strong> </strong>is to empower <em>humans</em> in the age of AI by creating computing tools controlled by individuals.</p><ul><li><p>Website: <a href="https://imbue.com/">https://imbue.com/</a></p></li><li><p>LinkedIn: <a href="https://www.linkedin.com/company/imbue-ai/">https://www.linkedin.com/company/imbue_ai/</a></p></li><li><p>Twitter/X: <a href="https://imbueai.substack.com/p/x.com/imbue_ai">@imbue_ai</a></p></li><li><p>Bluesky: <a href="https://bsky.app/profile/imbue-ai.bsky.social">https://bsky.app/profile/imbue-ai.bsky.social</a></p></li><li><p>YouTube: <a href="https://www.youtube.com/@imbue_ai/">https://www.youtube.com/@imbue_ai/</a></p></li></ul>]]></content:encoded></item></channel></rss>