Imbue

There will be a scientific theory of deep learning

Imbue — Fri, 24 Apr 2026 17:02:21 GMT

Deep learning works extraordinarily well. And we still largely don’t know why.

Neural networks can write code, diagnose disease, translate language, and produce images indistinguishable from photographs. The machinery underneath is, in principle, completely legible: you can write down the architecture, the data, the objective, the learning rule.

And yet we have no unified scientific framework explaining why training works, what the resulting networks will do, or how to predict their behaviour from first principles. The field trains largely by intuition, folklore, trial and error, and increasing scale.

Today, a group of researchers from Berkeley, Harvard, NYU, Stanford, the Flatiron Institute, Penn, and the Astera Institute are publishing a paper that argues this, slowly but surely, is about to change.

There Will Be a Scientific Theory of Deep Learning—by Jamie Simon, Daniel Kunin, and twelve co-authors—articulates an emerging discipline they term learning mechanics, and consolidates five converging lines of evidence that a rigorous theory of deep learning is not merely desirable but beginning to emerge.

Imbue has been proud to support this research. We believe deterministic engineering of deep learning systems will make them easier to build openly, and harder to monopolize. To accompany the paper, we recorded an episode of Generally Intelligent in which Jamie and Dan discuss the ideas in depth.

References

There Will Be a Scientific Theory of Deep Learning (Simon, Kunin et al.)
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer (Yang et al. 2022)
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability (Cohen et al. 2021)
Scaling Laws for Neural Language Models (Kaplan et al. 2020)
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (Saxe, McClelland, Ganguli 2014)
Neural Tangent Kernel: Convergence and Generalization in Neural Networks (Jacot et al. 2018)
The Platonic Representation Hypothesis (Huh et al. 2024)
Generalization in diffusion models arises from geometry-adaptive harmonic representations (Kadkhodaie, Guth et al. 2024)
On the Stepwise Nature of Self-Supervised Learning (Simon, Knutins, Fetterman, Albrecht et al. 2023)
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks (Kunin et al. 2025)

Transcript

Daniel Kunin (Intro)

At the heart of trying to understand artificial intelligence or how deep learning works is about understanding learning, and learning is about movement. Learning is changing parameters, so it’s the model moving through some parameter space. And physics has spent centuries building up tools and ideas and thought processes on how to think about movement.

Kanjun Qiu

Hey, welcome to Generally Intelligent. I’m your host Kanjun, and I’m the CEO of Imbue. We’re an AI company whose mission is to make tech serve humans, and we do that by building open agent infrastructure tools: agents that help humans maintain control as AI capabilities grow. And today we are talking about a new perspective paper from Jamie Simon and Daniel Kunin. There are 12 other coauthors. This paper argues that there will be a scientific theory of deep learning, and it gives a vision of how this will look. This is a little bit controversial in the field, and we’ll get into that.

Jamie Simon is a deep learning theorist. He did his PhD in physics at Berkeley, advised by Mike DeWeese. He won the department’s best thesis award, and his research aims to build first-principles understanding of the learning behaviors of neural networks, often taking inspiration from ideas in physics. Jamie’s also a research fellow at Imbue. We fund this work because we believe a scientific theory of deep learning is a great tool for democratization of power.

Daniel Kunin is a postdoc at Berkeley, also hosted by Mike DeWeese and Peter Bartlett. Daniel completed his PhD at Stanford, advised by Surya Ganguli. His research integrates insights from statistics, physics, and neuroscience to study the mathematical principles of artificial and natural intelligence.

We also have here Josh Albrecht, Imbue’s co-founder and CTO. Josh is a former machine learning researcher who recently has been focused on building open source tools for agents. Josh runs about 100 agents in parallel right now, and has been shipping like 50,000 lines of code every day—sorry, week. Not day. We’re not there yet. Maybe in a year or two — which is kind of crazy, but really interesting. Bringing the applied lens to deep learning.

Let’s get started. You guys are publishing a paper, and you make the case in this paper that there will, in fact, be a theory of deep learning. You talk about the emerging evidence for such a field, and you’re calling the field learning mechanics. Can you tell me what is learning mechanics? How would you describe it?

Jamie Simon

Learning mechanics is the term we’re proposing for essentially a fundamental, mechanistic, mathematical science of the learning and other behaviors of neural networks.

If you’ve heard of mechanistic interpretability, it often frames itself as the biology of deep learning. It approaches things with a biologist’s or systems neuroscientist’s lens, trying to pick apart anatomy, identify circuits, and connect these with a semantic interpretation of what these models are doing and how they’re thinking. And it’s mostly a qualitative science.

Learning mechanics hopes to be essentially the physics of deep learning: a first-principles, highly mathematically-grounded counterpart, more distant from the semantics and closer to things like the training process, the selection of hyperparameters, and the dynamics of how all this learning happens. A theory like this gives a solid foundation to ask lots of other questions in the world of deep learning that you might want to know the answer to.

Kanjun Qiu

Something I’m curious about is why does deep learning need such a theoretical foundation? Like, what does the “physics” of deep learning give us that the biology doesn’t?

Daniel Kunin

Having a theory of deep learning would have many practical, scientific, and safety implications. From the practical point of view, deep learning has historically been driven mostly by trial and error: seeing what works and what doesn’t, and going pretty much gradient descent in terms of practice. Having an actual theory for the foundations of what’s driving success in these tools would allow us to be more theory-driven, more efficient, and potentially more safe. We could think about the risks that come with these technologies and design ways to mitigate those risks.

In terms of the scientific reasons, it’s just a fascinating idea to try to build an understanding of these very complex machines that are generating text and images, and actually try to understand what’s really driving that success. And then there’s the neuroscience side of things: if we really understand how artificial intelligence works, that might also provide a lens for understanding natural intelligence. So building a scientific theory of deep learning would have all these different practical, safety, and scientific implications.

Kanjun Qiu

So one thought is: if we have a physics of deep learning instead of just mechanistic interpretability, we have less grad student descent and more like engineering for deep learning systems. Less guess and check and more causal, predictive models for how we should expect a training run to pan out based on different dynamics.

Daniel Kunin

Yeah. Machine learning in general is a really interesting technology because our choices don’t directly go into the final product. When we design these systems, we’re designing a playground that iterates on itself, and the result is the model. Our design choices as engineers are implicitly affecting the final product, making it very difficult to know when that product might, you know, when we design a bridge, we’re designing the bridge itself. We can understand how our design choices affect whether the bridge is going to fall or not. Having a theory of civil engineering gives us an ability to understand the conditions under which bridges might fall. That’s very difficult when our design choices are not going directly into the final product.

So this idea of building a learning mechanics is in some sense trying to understand how those design choices in the beginning—in the setup, whether the data, the architecture, the learning, the hyperparameters of that whole process—affect the final product.

Kanjun Qiu

It’s kind of like: how do the initial conditions pan out?

Daniel Kunin

Yeah.

Jamie Simon

And to the question of why we want fundamental physics in addition to a kind of biology or systems neuroscience perspective: there are things that you can do with a quantitative science that are much harder to do with a qualitative science. And the opposite is also true. There are things you can do with a semantically aware science that you cannot do with a “dumb” science—like explaining cognition in the brain from quantum physics alone is very difficult, but explaining why neurons fire without an understanding of atomic physics is impossible.

If we’re serious about the challenge of understanding deep learning and putting together some kind of publicly available theory about this, then we want to be studying it at all levels of abstraction. And this level happens to resemble physics. There’s no dogmatic reason why there should be a physics of deep learning any more than a pharmacology of deep learning. It’s descriptive rather than prescriptive. In trying to understand these things and asking natural questions, we were finding—and people in our field have found—that asking certain types of natural questions leads you inexorably towards this first-principles, quantitative problem that’s at the root of everything. And we think that problem has an answer that’s going to look kind of like a physics in some ways and not in others. That’s what we’re trying to articulate here.

Daniel Kunin

To me, learning is really a process of movement. Learning is changing parameters. And so it’s the model moving through some parameter space. And physics has spent centuries building up tools and ideas and thought processes on how to think about movement in the physical space. So at the heart of trying to understand artificial intelligence or how deep learning works is about understanding learning, and learning is about movement.

Kanjun Qiu

That’s super interesting. I love that way of thinking about it. And one thing that’s nice about deep learning systems is that they’re completely measurable, unlike biological systems. You can learn so much more.

Jamie Simon

Oh my gosh. Yeah. Both of us—I did my PhD in the Redwood Center in Berkeley, Dan is a postdoc there, and I’m a visiting scholar there—we hang out with neuroscientists all the time and get to see firsthand how difficult the task of doing theoretical neuroscience is, because you can write down all the mathematical models you want, but you’re so limited in what you can measure about the brain.

Kanjun Qiu

I’m curious: why now? What were you noticing that led the 14 of you coauthors to get together and write this paper now? Not five years ago, not ten years from now. What are you observing?

Daniel Kunin

Well, there’s a simple answer to that, which is most of us just graduated from our PhDs. But we didn’t actually get together with the purpose of writing this paper. We got together to share ideas and talk about research. All of us think deeply about trying to understand deep learning, and we take different approaches. Many of us have either a physics background or a physics perspective, but that’s not true for everyone on the list.

Jamie kind of pulled us together as a group. We had all known each other—either we were in the same institutions or neighboring institutions. Jamie was at Berkeley, I was at Stanford. Or we were going to the same conferences. At some point, Jamie approached me and said, why don’t we create a community among these grad students who are studying the same ideas with similar perspectives but in different institutions and different labs, and try to come together to think more deeply about what we’re doing and share our ideas—not just through arXiv papers, but through conversation and dialog.

So this paper actually came out of a retreat that Jamie organized, where we all came together in the woods in the Berkshires, cooking food for a week and sharing our research ideas. At some point we realized we all had quite different perspectives on how we do research. We might not be coming together to produce a new technical contribution to the field. But we realized that as a field, we really needed to summarize the progress we’ve made and our internal intuitions about open directions and next steps. So this paper kind of came out of that.

Why now more broadly in the field, I’ll let Jamie answer that.

Jamie Simon

There are a few reasons why the present moment is particularly promising for the development of a scientific theory of deep learning. One is just that deep learning has never been more accessible, more commoditized, more highly studied. We now have a pretty good convergence on methods for training large-scale systems that are fairly reproducible and work fairly well. It was harder to do science of large models when things were still being hashed out, but now that landscape has solidified. Most of us are in this fortunate spot in our careers that offers us time to do it.

Kanjun Qiu

That’s very exciting. You’re free to work on the important problems.

Jamie Simon

Yeah.

Daniel Kunin

And I think “important” is a key word right there. We’re in a pretty unique time where the practice of these technologies is skyrocketing, and we can see all the effects they’re having in our world, both here in San Francisco and the Bay, and more broadly. The importance of really trying to understand these things is becoming more important.

Kanjun Qiu

Yeah, totally.

Jamie Simon

And in our field, we’ve been watching the field of deep learning theory and the academic efforts to understand these systems push forward and grow and change and hit walls for five, six, seven years. We’ve seen a lot of things that have worked, and our assessment, getting together, was: oh yeah, we think there are serious ways in which this effort—which for a long time has been embattled, really difficult to get purchase on the problem—is actually starting to work. Things are starting to come together in different ways.

So this paper is trying to convey a tone of optimism as we organize different lines of evidence that things are actually coming together. Everything we do in this paper is trying to be descriptive, not prescriptive. We didn’t set out to write an optimistic paper. We talked about what we thought was happening and where we should go next, and found that the natural thing to do was to articulate these emerging ideas in one place and try to drum up momentum and excitement to bring them forward.

Josh Albrecht

Maybe we can get into some of the details of why you both believe it’s now possible to make this type of theory. For a while there has been a sort of reputation among practitioners—in the past 5 or 10 years of people doing deep learning—that theory is kind of useless or pointless. Empirical results are really a lot further ahead than theoretical understanding of neural networks. This was not always true. In the past, we had a pretty good understanding of random decision forests, support vector machines, some of these other systems. But with deep learning there was this kind of bigger and almost growing gap where it felt like we were leaving theory behind. What has changed in the past year or two? What new tools are there? What gives you this optimism that we might actually be able to make really interesting progress on the theory of deep learning?

Jamie Simon

Let’s widen for a moment the idea of what theory is. Inside these large labs, there are massive science of scaling teams. They want to ask: how does every hyperparameter scale with every other hyperparameter? Getting these things right is really important to scaling up a large model. The company that does this better could have the better model after the next training run.

Identifying these empirical hyperparameter scaling relationships is sort of like a protean version of theory. It’s like: okay, there’s an exponent on this log-log plot, it appears to be two, to within some error bars. Great. That’s useful. When you get a clear signal like that, you can carve off the problem and toss it to your friendly neighborhood theorist. And there’s often a nice reason why it’s two, or whatever it is.

As large-scale model training has become more and more systematized, getting things right fairly early on—instead of just trying every possible permutation of architectures and hyperparameters—has become more and more important. So it’s sort of pushed this science-of-scaling perspective, which has made clear that there are principles here.

There’s a famous example, particularly celebrated in the theory community, called Maximal Update Parameterization (µP). µP leads to a technique called μTransfer for taking certain hyperparameters, like the learning rate, of a smaller model and scaling it up to a larger model. This had breakthrough success and was used a lot around the GPT-4, GPT-5 era, and in many different forms is now kind of just baked into the way people think about scaling up models. The paper actually came out right when I joined Imbue, and you guys had me present it at Journal Club.

Kanjun Qiu

I do remember that. Yeah. Greg Yang’s paper.

Josh Albrecht

Can you give a very brief description of that paper and the technical perspective?

Jamie Simon

Absolutely. Here’s something that every practitioner training large models has experienced. You’re going in, training a large model — it’s really expensive, it takes hours or days or weeks to train the whole thing. But there are some numbers you have to set. These numbers are called hyperparameters, and they’re things like learning rate, depth, width, number of heads in your transformer, all of these things. You go and train the large model and find: that did not work as well as I expected. You think: I got the hyperparameters wrong. So what do you do? It’s too expensive to just try different combinations on the big model. So you try it on a small model, find some good hyperparameters, go to the big model—and they still don’t work very well. mu transfer is essentially a technique that identifies certain non-dimensional quantities that it prescribes should remain the same upon scaling up, in particular the width of a model, the hidden dimension. By preserving these—so it ends up looking like you scale your learning rate with a certain exponent relative to your width, and ditto with the initialization scale of each layer—you preserve non-dimensional quantities related to training and feature learning. This happens to often preserve good performance from smaller models to bigger models. So it can really reduce your overhead and hassle when training the bigger model.

Daniel Kunin

Yeah. So practically it means you can spend your time optimizing your hyperparameters on small models, find those optimal conditions, and then transfer them to large models.

Jamie Simon

An analogy I think is particularly apt is building a small model of a bridge before building the big one. When you go and build the Golden Gate Bridge—

Kanjun Qiu

It’s much more expensive to build the big model. You want to do it on the small one first.

Jamie Simon

For any major bridge construction project, there are many competing designs to choose between. But you can’t just go build the Golden Gate Bridge ten times and see which one stands up, which one bears the most weight.

Josh Albrecht

Unless you raise enough money.

Jamie Simon

Unless you raise enough money. So one way you do this is build a small model of the bridge and see which works best. But hang on: it’s not that easy, because how do you know the small model is informative? As you scale something up, material properties change. Just look at how ants can support so much more than their weight, but we can’t. So how do you make a small model that’s informative for the big model? You can do this if you understand things about materials science and the scaling relationships of different stress and strain and fracture quantities as you make a model bigger. What this mu transfer idea is essentially doing is identifying the right nondimensionalized quantities to closely examine in your small model so that they’re informative about your large model.

Kanjun Qiu

So μTransfer is a good example of how theory applies to practice and helps us train larger models more effectively and more cheaply and be able to run small-scale experiments to do that.

I want to dive into the paper itself and talk through some more examples like this. You guys state five observations that serve as evidence that a theory is emerging. One, there are analytically solvable settings that exist. Two, insightful limits actually do reveal fundamental behavior. Three, simple equations can capture meaningful macroscopic statistics. Four, hyperparameters can be disentangled and actually understood so it’s not all a big mess. Five, even across settings and tasks where the training setting is different, you end up seeing universal phenomena.

I want to dive into each of these, maybe one by one. But before we do, I’m curious: how did you get to these five observations? What is it about these five that made you include them in this paper? Were there other observations you didn’t include? Are these five saying something in particular? Are they comprehensive?

Daniel Kunin

I can’t completely remember how we settled on those five, but I do remember the ordering had quite a few iterations.

In terms of: are there other lines of evidence that there will be a scientific theory of deep learning? Yes. This is not a comprehensive review of all papers in deep learning theory and all different approaches. There are researchers who take other approaches that we are not reviewing in this work.

These are the approaches we focused on as the most promising examples of evidence that this theory will exist, and they all have this flavor that is physics-inspired: simple models, macroscopic variables, taking limits, disentangling hyperparameters, and universality. They’re all in the spirit of mechanics. I think that’s partly why we settled on these five.

Jamie Simon

My memory of this is: we were at this cabin in the Berkshires, it was like day three, and most of these 14 authors were there, and we were trying to articulate what our shared vision was, what was in common between our perceptions of what really matters and where the field would and should go. We started filling up tons of big whiteboards with different clusters of ideas, and we changed many times on how we thought about these.

At first they were like guidelines for a young person getting started in the field to follow: look for toy models, take limits when you can, make sure you study your hyperparameters. Then we realized after a few iterations that there’s a stronger framing: hey, each of these isn’t just a recommendation. It actually carries with it some clear successes from the last ten years in deep learning theory, and is forward-looking in that it suggests important open directions and ways to think about research going forward. And they all together have this mechanics flavor — this flavor of classical or statistical mechanics where it’s like learning is about movement and you study the interaction of these components and how they all lead to learning.

Kanjun Qiu

Studying the mechanics of that movement.

Jamie Simon

Yeah.

Daniel Kunin

If we’re saying there will be a theory of deep learning, maybe a more appropriate question is: what is actually the barrier to a theory of deep learning? We have access to the learning rule, we have access to the data, we know exactly what the architecture is, we know the task. And as we said, we can measure everything and anything. So it’s actually kind of surprising that we don’t have a theory.

Josh Albrecht

Yeah. The question is: why haven’t we figured it out already?

Kanjun Qiu

So how would you answer that?

Daniel Kunin

It’s not the opacity of the problem. It’s the complexity. It’s an extremely complex, interacting, nonlinear, high-dimensional system and we’re trying to understand what’s going on. One way to think about all of these observations we put forward as evidence that there will be a theory: they’re all ways of handling that complexity. Taking that complexity and simplifying it. These five categories are success stories—clusters of research papers and research approaches—and they all, at the end of the day, take a different way of handling that complexity, which is the barrier to a scientific theory.

Kanjun Qiu

That makes a lot of sense.

Jamie Simon

And I think, to this question of why isn’t there a theory already: the questions that were asked in decades past hadn’t yet realized that the training processes of deep learning—this high-dimensional, complex, messy thing that depends on real-world data statistics—that there’s no way around that and it really needs to be grappled with. The classic theory that people still often think of as statistical learning theory, has this idea that a simple, parsimonious model with high regularization that doesn’t overfit will generalize well. That’s a really beautiful, mathematically correct, self-contained theory. But at the time it was developed, we didn’t have modern deep learning. There was no way that this understanding—that this complexity can’t be reduced, there’s no easy way out—could have been worked into the DNA of that whole way of thinking. The more modern flavor of deep learning theory is less like mathematics that tries to work out everything and offer a nice simple guarantee to practitioners, and is actually much more like a diverse, rich, very scientific approach that just dives into the complexity.

Kanjun Qiu

And tries to make sense of the pieces of complexity, as opposed to trying to simplify it all down.

Jamie Simon

Yeah. And a nice thing about that approach is that we don’t need one simple answer to be making progress. Those generalization bounds from the past, essentially formalizing Occam’s razor—a simple model with higher regularization won’t overfit—that’s kind of an end-to-end theory. I think we won’t have that for deep learning for a while. But that’s fine, because diving into this whole messy process and finding bits of it that we can organize—finding structure and regularity — there’s so much wonderful stuff to do. And even organizing pockets of it is useful for people who are working in those pockets.

Kanjun Qiu

So these are five pockets of organized complexity.

Jamie Simon

Yes. And our hope and belief is that these pockets can be widened and eventually linked together into something resembling a comprehensive theory. And this will happen in the next decade or so.

Kanjun Qiu

Let’s talk about some of the pockets.

Josh Albrecht

Should we start with the first one?

Jamie Simon

Yeah.

Kanjun Qiu

The first pocket is: analytically solvable settings exist. As a theory layperson, what does that mean and what does it let you understand or organize?

Daniel Kunin

My research probably most overlaps with this pocket. Deep learning — let’s first talk about what it is. It’s an architecture, a dataset, a task, and a learning rule. Sometimes we can find simple versions of these: a simple architecture, maybe a simple learning rule, a fixed dataset, and we can see how all those pieces interact. In this section of the paper—section 2.1—we talk about two different ideas for finding these simple, analytically solvable settings. They’re both, at the end of the day, about linearizing the problem: linearizing it in terms of the data or in terms of the parameters.

Linearized in terms of the data means: deep learning is these linear followed by nonlinear transformations in sequence. If we just got rid of those nonlinear transformations and thought about training a model that’s just a sequence of linear transformations from input to output, what would that look like? Would training a deep linear network be the same as training a shallow linear network?

This setting has had immense progress in the last 10 or 15 years, showing that they’re different — that the deep linear network’s dynamics, and the solution it learns, are not the same as a shallow linear network. So depth really does have an effect on the learning process, and we can understand that effect directly. That effect—generally, a deep linear network has a preference for how it breaks down the task, learning the principal directions—the singular vectors of the task. It has a preference for its ordering, how it learns those singular vectors. And that idea of a simplicity bias, or learning some things before others, is a hallmark of more modern, more realistic deep learning.

Josh Albrecht

That’s a really good example, because when people think, “well, you simplified the problem too much—if you remove the nonlinearity, a deep fully linear network and a shallow fully linear network are mathematically kind of the same, nothing interesting is happening”—the learning mechanics make them slightly different. You’re saying that by splitting it up, you can see actually where some of the parts of deep learning are coming from: this idea of learning the simple things first and then the more complicated things afterward. Even though you’ve simplified it by removing the non-linearities and made it almost trivial—just learning one matrix or something — you’re still actually learning something interesting about learning and about the setup by simplifying it. And that maybe helps us build toward our more full picture, even if we’ve made unrealistic assumptions here.

Kanjun Qiu

Just so I understand: linearization means you remove the nonlinearities, just weights?

Josh Albrecht

The nonlinearities.

Daniel Kunin

Yeah. To linearize a network in terms of the data, you take the nonlinear network and remove the nonlinearities at every layer. If you’re using an architecture with ReLUs, you just get rid of the ReLUs and use linear activations. The activation comes in, activation goes out, no change.

Kanjun Qiu

That makes sense. What do you learn from these linearized settings that applies to nonlinear settings? And what do you lose from the linear setting that’s no longer applicable?

Daniel Kunin

Practically, we’re definitely not suggesting you train linear networks. So you lose the fact that the network is no longer useful—

Kanjun Qiu

It’s only usable for studying the dynamics.

Daniel Kunin

Exactly, studying this idea of how the initial conditions affect the way in which the model changes in function space. What you gain is analytic tractability. It becomes a much simpler problem to study; we can think about how those four ingredients interact. And in particular, it points to this idea that the learning process is inherently biased towards simplicity. I believe that is going to be a critical part—in fact, that idea comes up in other parts of our paper, in the other sections.

Kanjun Qiu

Yeah, in almost every section.

Daniel Kunin

And it’s going to be an essential idea behind why deep learning works so well — that it is biased towards finding really meaningful aspects of the data, meaningful signals, before less meaningful ones. By doing this, it’s biased towards a simple or parsimonious solution, even though it might look more complex. It’s actually biased towards simplicity.

Josh Albrecht

Which helps explain some of the generalization, right? Because you’re starting with simpler things, so maybe it can generalize better. One of the things I’m not sure comes through fully in the paper is this idea of: they learn simpler things first. And I think both of you have ideas about what the shape of this grand theory could look like, and intuitions about how these things will link up. So this paper isn’t just saying “we hope there will be a thing” — it seems like you each have ideas about what the actual grand theory will concretely look like, and where these lessons come from: analytical tractability, learning simple things first, a bunch of other lessons like that.

One thing I want to highlight for people listening is: really skip to the end and check out the future directions, and reach out if you’re interested. There’s a lot going on. There’s a lot of different signals, a lot of different evidence. How it all fits together is interesting—it’s still emerging, still early.

Kanjun Qiu

I’m also really curious about the “simple equations capturing meaningful macroscopic statistics” section. Can one of you explain what that bubble looks like? You talk about neural scaling laws, etc.

Jamie Simon

Yeah. This is section 2.3 in our paper: simple macroscopic laws. If you look back at how other sciences developed in paradigm-setting eras, there were often a whole bunch of disconnected empirical observations that took the form of nice mathematical relationships, or laws, that were only later explained or linked together.

There are a number of examples like this, including neural scaling laws—an empirical law that is currently driving basically everything happening in Silicon Valley. And there’s something called the edge of stability effect, which I think is really beautiful. It was first found as an empirical law. Basically: as you train a neural network and move over the loss surface, sometimes you’re in a spot where the loss surface is very smooth, and sometimes you’re in a spot where the loss surface is very steep — you’re in a narrow valley. You can measure a quantity called the sharpness; technically, this is the maximum eigenvalue of the second derivative matrix of the loss surface. If you look at how this evolves over time, you find something quite reproducible on large models: as you train, take steps, sharpness grows and grows—but then it levels off at a particular value that appears empirically to be two over the learning rate. This only shows up this cleanly when you’re doing full-batch gradient descent, not stochastic.

Josh Albrecht

That’s probably not a coincidence.

Jamie Simon

Not a coincidence. The scientist most associated with progressive sharpening and edge of stability is Jeremy Cohen, one of the authors on this paper. The observation he makes in that first paper is: well, two over the learning rate is exactly the value for the sharpness.

The Hessian curvature—that classical learning theory, the traditional branch of optimization work from the last century—predicts the instability to start. If you have a really simple function, if you have a valley that’s too steep for your learning rate, you’ll bounce out of it. What’s the critical steepness? It’s two over the learning rate.

Kanjun Qiu

Interesting.

Jamie Simon

So it’s not a coincidence. But of course, that assumes some nice structure on the loss surface that allows you to actually solve the dynamics. This is extremely reproducible in models large and small, and has big implications for your choice of learning rate and the effect that choice has on your training run and the later generalization of your model.

Kanjun Qiu

As a practitioner, how would I use this to choose a learning rate?

Jamie Simon

It’s a difficult question to answer because so many other things go into the choice of learning rate.

Kanjun Qiu

Let me ask a different question. As a practitioner, how should I intuitively think about progressive sharpening, or sharpening in general? And also why does it increase, intuitively?

Jamie Simon

So this is actually something my collaborators and I are trying to build a good theory for right now. Our intuition is twofold. One: the weight matrices of a neural network tend to align with each other in a linear algebraic way and also grow in norm over the course of training, and both of these things tend to lead to higher sharpness. That’s a qualitative explanation that needs to be built into a quantitative theory. Another reason is that there’s a unifying geometric explanation: gradient descent moves in the direction of steepest descent by definition, and if there’s a direction that the network can really fall off and speed up very fast, it sort of gets sucked towards those.

Kanjun Qiu

Trying to find those steepest directions.

Jamie Simon

Yeah. There’s a sense in which neural networks kind of learn to learn faster, and that probably explains progressive sharpening — but I should add a disclaimer that these are my personal hunches.

Daniel Kunin

One of the open directions in the paper is really trying to understand the connection between sharpness, edge of stability, and generalization. We have an empirically observed and repeatable experiment. We have a theory for why gradient descent dynamics can be stably at this edge of stability. But connecting that directly to performance, generalization, and feature learning is an open direction—a really interesting one to pursue.

Kanjun Qiu

Interesting. This bucket of things — simple equations capturing macroscopic behaviors — they’re often these empirically observed laws. And now this is a place where you can start to develop theory around them precisely because they’re so empirically consistent.

Jamie Simon

Wonderfully, yes. If you look at the history of chemistry, you have all of these gas laws: pressure and temperature appear to be proportionally related holding volume constant, pressure and volume are inversely correlated holding temperature constant. And you can combine these into PV = nRT. Thinking about these kinds of things lets you guess: gases are made of discrete molecules that bounce around in a kinetic fashion, that leads to a statistical mechanics view of the system. You could imagine it would have been much harder to go the other way: rather than top-down empirical laws, starting by saying “let me first write down a fundamental theory of gases at the micro level and make predictions at the macro level. Ah, I posit that there should be this quantity called pressure that scales intrinsically in a certain way, and therefore...” I mean, this is what people are trying to do with string theory right now, and it’s really hard. It’s hard to just guess the answer and make predictions without some kind of empirical grounding.

Daniel Kunin

It’s interesting that we talked about the first bucket and the third bucket because they’re really opposites. The first bucket is all bottom-up: building up from foundational principles and simple settings to try to understand phenomena. The other one is empirically top-down. A deep understanding of these things from the top-down approach would have huge practical implications. Thinking about neural scaling laws and being able to a priori understand the exponent. What about the data, optimizer, architecture, etc. leads to that exponent and the scaling law? Being able to predict those exponents before you actually find them would be a huge win for developing more powerful systems.

Josh Albrecht

Maybe we can dig into the second one, the insights of limits into fundamental behavior.

Daniel Kunin

Let me let Jamie talk about this, but I’ll say first that I think this is actually maybe the most important of all the directions.

Josh Albrecht

That’s why I wanted to dig into it.

Daniel Kunin

Yeah. And maybe the most physics in spirit.

Jamie Simon

Why is it the most important?

Daniel Kunin

I think it’s where we’ve made the most precise statements about realistic systems, when taking limits and thinking about these objects in the most high-dimensional sense. In that sense it’s the most important: it has flavors of a lot of the other pieces—for example, 2.1, analytically solvable settings, because by taking limits we end up getting analytic tractability. But it’s also related to section three in the sense that we’re talking about realistic systems, not toy systems. This intersection of realistic and tractability, by taking limits, has been a major success in deep learning theory in the last decade—less than a decade, honestly.

This probably starts with Arthur Jacot’s, one of the authors on this paper’s, results on neural tangent kernels.

Jamie Simon

I’d say it starts a little before this.

Daniel Kunin

Before that, yeah.

Kanjun Qiu

Tell us what “taking limits on a system” means.

Jamie Simon

I mean limits in the ordinary calculus sense of the term. You have some number describing your system — it could be a size parameter, or a learning rate, or some other parameter — and you take it to either infinity or to zero. And taking a limit of a variable removes it from the expression it’s in. In calculus: consider the function 5 + 1/x as x grows to infinity. Clearly the 1/x vanishes. Taking the limit gives you the coarse picture, which is: if x is big enough, you’re left with the 5.

This is one of the foundational tools—maybe the foundational tool—of statistical physics. Going back to the gas analogy: imagine I hand you a little box with 100 particles of gas bouncing around, and I say, alright Kanjun, give me a theory of this. And you’re like, how? And I’m like, I’ll tell you anything you want to know—the quantum mechanics of these particles, their interactions, whatever you want to know about these molecules. And you’d go to the whiteboard and start thinking about how 100 position and velocity variables interact with each other. You can see it’s complicated. You’re tracking a lot of stuff. But then there’s this thing that’s paradoxical and amazing: as I add more and more particles, in some sense the problem becomes easier.

Kanjun Qiu

So you treat it like one body instead of 100 particles.

Jamie Simon

Yeah. In this glass of water, or in my lungs full of air, there’s more than 10^20 particles.

Josh Albrecht

That’s basically infinity.

Jamie Simon

So you can imagine that correction: 5 + 1/x — well, if x is 10^20, 5 is going to be a pretty good approximation. So it’s actually really easy to derive the ideal gas law, PV = nRT, once you realize: these are interesting quantities. When I have lots of particles, there’s a sort of relationship between pressure, volume, and temperature that becomes exact in certain limits.

Neural networks also admit descriptions like this. The simplest one is the gradient flow limit—so simple that it’s often not even discussed in the same breath as these other limits, but it is totally foundational. The gradient flow limit says: take my step size to zero. In gradient descent, your parameter update is equal to the gradient times some learning rate parameter, eta (η). Gradient flow says: take η to zero. You might say, well, I’m not going to go anywhere then. And I say: okay, take the number of time steps commensurately larger. Something that was previously discrete—100 particles or molecules of water—ends up being a continuous system that you can treat with differential equations.

This story has played out in a dizzying number of limits in deep learning theory. Some of them are practically useful. Some are merely insightful. The earlier μ transfer story about hyperparameters was derived in the theory of infinite width. There’s very little you can say about a neural network of width ten, just like there’s very little you can say about a glass of water with 100 molecules in it, because everything’s so complicated and messy and contingent on your initialization and steps. But the systems we’re actually using now are so big—that’s another answer to the “why now” question.

Kanjun Qiu

They’re basically continuous. They’re approaching continuous.

Jamie Simon

Yeah. At least in certain respects. Neural networks are doing so many things that they’re maybe not continuous in every way. But when you have depth 100 and width 10,000, which are pretty typical numbers for a large language model today, it’s not surprising that some of the math you might do about depth and width going to infinity — as long as you’re careful to scale things in the right way to preserve those non-dimensional quantities — might be insightful.

Kanjun Qiu

What are some things we’ve learned from taking these limits so far?

Josh Albrecht

What kinds of useful infinities are there? You said neural tangent kernel — the μ stuff is infinite width. We have infinite depth. Do we have infinite steps?

Kanjun Qiu

How should I think about some of these continuous systems? For example, intuitively: what’s the difference between infinite width and infinite depth?

Jamie Simon

The interesting limits split into two types: those that are realistically useful, and those that are theoretically insightful and may reveal some fundamental learning behaviors, whether or not we’re actually using that limit in practice.

Some examples of limits that are really inspired by practice: infinite data, infinite context length, infinite width, infinite number of attention heads, infinite depth.

The neural tangent kernel limit is one way of taking things to infinite width. It turns out there are two ways of scaling to infinite width. One gives a simpler system that does not do feature learning — hidden representations don’t evolve, but it’s mathematically very beautiful and can be used to answer certain questions in a compact way. Then there’s the more realistic limit, which is μP, or the feature-learning or “rich” limit — it goes by different names—and this actually does preserve feature learning, and it’s way harder to study.

This has now been pretty clearly converged on as the right infinite-width limit to study. It’s a guiding belief in the research community that pretty much whenever you can study something at infinite width, it’s a good idea, because finite width is just adding one-over-width corrections.

Josh Albrecht

It’s like the icing on the cake. You want the cake first, and the icing is: how do we discretize it? Like, if we could do it at 10 trillion layers—okay, maybe we don’t need 10 trillion layers, it’s just annoying to put on a GPU and it’s too big. Can we get away with 100? Can we get away with 1000? Usually, yes.

Jamie Simon

Yeah. It’s worth noting that the way computational physics solves any continuous system — how do you do finite element analysis of a metal beam that’s flexing? You don’t solve a continuous PDE because you can’t even represent a real number on your computer. You discretize it into a mesh and then the mesh flexes and you’re solving linear algebraic equations. How do you solve fluid flow? You discretize volume and track the fluid flow at all the points in the mesh. How do you solve any ODE? The simplest method is Euler’s method—it’s basically gradient descent, just discrete steps.

So there’s a sense, which we name the discretization hypothesis in this paper, where this belief—that has been bouncing around needing a name—is articulated. The idea is: maybe practical deep learning should be thought of this way, and maybe this is why scaling things up only makes things better. Of course you simulate your fluid with a finer mesh; of course you’ll get a more accurate description.

Kanjun Qiu

So deep learning is a discretization solution.

Jamie Simon

Basically. The discretization hypothesis states that essentially any practical deep learning system is a discretization of some ideal continuous system that would have performed better on multiple axes. You have finite data, finite step size, finite width and depth, and all these other things.

Kanjun Qiu

That’s a super interesting way of thinking about what a neural network is.

Josh Albrecht

It makes it so much less magical. Right now, when people think about AI, they’re like, it’s this black box, we don’t understand it. And we also don’t understand exactly how water moves around in a glass. But we know it’s not going to jump out, right? We just know that’s not happening because that’s how it works. No matter how fine we make our simulation, we’re never going to find a way to make it jump out — that’s just the nature of the system.

It’s really interesting to see deep learning in this continuous way. There are probably better ways we could do these continuous flows, better ways to set these networks up, for sure. But scaling up isn’t doing something qualitatively different in this sense — the scale is giving us better and better approximations.

Jamie Simon

Yeah. You could also imagine that new abilities become resolvable once you’ve discretized your mesh of all human text at a finer resolution.

Josh Albrecht

Because when you’re doing fluid dynamics with just two particles, you’re just not going to get a very convincing waterfall.

Jamie Simon

Yeah. You won’t see turbulence with a handful of particles. You won’t see waves.

Josh Albrecht

That’s really interesting.

Kanjun Qiu

In terms of open work in the limits area, how do you guys think about that?

Jamie Simon

We have two open questions about this.

Daniel Kunin

The discretization hypothesis.

Jamie Simon

Yeah. And I was also thinking about hyperparameters—zero hyperparameters. One open direction is: is this discretization story true? I published a paper a couple of years ago called More is Better that shows the discretization hypothesis is true for a class of models called random feature models—fairly general, but these models don’t learn features; it’s just a random feature projection that’s static, with a linear model trained on top. You take learning rate to zero, take width to infinity, take depth to infinity.

In transformers, there’s more than just a width—there’s also the number of heads. Width and depth are maybe just practical necessities that actually obscure our view of the real, deeper, simpler thing the model is doing. And every time you clear away one of these things, you leave a simpler picture behind. After the limit of infinite width, people generally understand: if you’re studying something with width as a possible confounder, here are the experiments you do to make sure you’re large enough that it’s not a confounder—and that lets you see the rest of the system more clearly because you’ve factored it out.

So there’s another open direction: is there a model of deep learning where you’ve taken every possible limit and you’re left with something that has no hyperparameters and is just some platonic ideal — some alien creature that has a minimal number of degrees of freedom.

Josh Albrecht

Like the ideal gas law — you’d still have some quantities about data and other things, but you wouldn’t have to do all this weird other machinery.

Jamie Simon

There’s this idea that really the complexity is in the data, not the model. The model can be kind of dumb — a comparatively small number of lines of Python to describe the architecture of a transformer. But the dataset is really where the gains come from. So as a theorist, this suggests you should make your model as simple as possible, subject to it showing a handful of behaviors you know are important, like feature learning. And then ask: how does this work on arbitrary data? Hitting hyperparameter count zero would let you ask: how does this ideal “ball of clay” accept inference from the data you put it into contact with?

Kanjun Qiu

That’s like a totally different system to study, which is really exciting.

Josh Albrecht

That’s a good description of much of section 2.2 — how we can use these limits. Maybe moving on to later sections: we already touched on 2.4, how hyperparameters can be disentangled. So talking about 2.5 and then the actual applications — what exactly did you mean by universal phenomena appearing across settings and tasks, and what kind of implications does that have?

Daniel Kunin

There’s one experiment I think is pretty interesting and insightful on this idea of universality. Specifically, a paper by one of our authors, Florentin: if you take two diffusion networks — diffusion models, where you give them random noise and an image comes out — two different networks trained on different datasets, and you give them the same random patch of noise, at some point as the models get bigger, you find they produce the exact same image. So as models scale in both data and size, there’s something universally shared among all these different models — they’re converging to similar solutions.

If this weren’t true, it would be very difficult to build a scientific theory, because every different model would require its own theory. The idea that there’s some level of universality between large architectures is very promising for the idea that we could build a scientific theory of deep learning. What we really want to understand is: what are these shared properties? What is universal among them?

Kanjun Qiu

That’s super interesting. What are some other examples of universality and how it shows up?

Daniel Kunin

A good example is the debate from a couple of years ago: are large language models stochastic parrots, or are they learning world models? Are they basically just learning to predict the next token on correlation patterns within their corpus of data? Or are they actually understanding something about the world — why the next token would be the next token? The general consensus now is that they are actually learning deep understandings of the world. And so that would mean different large models are learning similar world models.

The idea is that maybe there is a universal understanding of the world that all these different models are converging towards, which could explain why in the diffusion network example, two different networks given the same random patch of input generated the same images — because they’ve learned the same world model of realistic images.

Kanjun Qiu

So interesting — that there might be a universal world model that is predictive of data in the world. Despite data being shown in different orders and different types of data, and models having different architectures — in order to predict things, there’s some kind of convergence.

Josh Albrecht

It comes with some pretty big asterisks. This is on a given dataset, presumably with similar data. If you had two totally separate datasets, you couldn’t possibly get the same thing unless they were from roughly the same distribution. Like, if you only have red images and only blue images, I can’t give you the same data. There has to be overlap in some way. You have to be general enough. They’re drawing from a similar distribution, or whatever.

Jamie Simon

Yes and no. There’s a paper articulating this idea — they called it the Platonic Representation Hypothesis. The idea is that there’s one true universal world model, and any dataset you might take is a projection of that world model, like the shadows in Plato’s cave. Different datasets will capture different facets of it, but as long as they’re rich enough to capture the full range of things, they’ll learn similar representations. And there are some, you know, debatable but pretty striking experiments that show you get similar representations even between vision and text models, where the datasets have no overlap — just images and their captions.

Kanjun Qiu

How is similarity between representations measured?

Jamie Simon

That is a terrific question. And such a thorny one. You can start to think about it mathematically fairly easily. Take two big models — say, two language models — feed in the same document, the same prefix, into both, propagate forward to some layer, and ask: are these two models thinking the same way at these two layers?

Daniel Kunin

And the risk is that in high dimensions, things that are actually quite dissimilar might look very similar. The real question — the risk here — is that we might be fooling ourselves that these two things are similar.

Kanjun Qiu

But actually it’s a super high-dimensional and dissimilar.

Daniel Kunin

And this is a huge challenge in deepening our understanding of universality across models. It’s also something people have been asking in neuroscience for a long time: how do you compare neural recordings from two different organisms? How do you compare the neural recordings from an organism and an artificial neural network? How to compare high-dimensional objects and ask “how similar are they, really?” is a very difficult question. A lot of progress has been made in that direction, and we expect more.

Jamie Simon

I think the methods question here is actually more important than the answer. A yes or no to this question — I think there’s some version of asking it where the answer will be yes, and some version where the answer will be no. The open direction isn’t “do they?” — it’s “what exactly do you mean by similarity?” In what way? It’s the only one of our ten open directions in this perspective paper that’s really about methods, about the metric.

My sense from kernel theory is that probably any useful metric will need to capture what functions are easily learnable from a simple model on top of these representations — for example, a linear model. Getting everything to work with large models is really a challenge.

This is one of those deep and tantalizing empirical questions that people have been wondering about for a long time. It’s starting to seem like, in some sense, it’s got to be yes. But we don’t know how to precise.

Kanjun Qiu

What does the representation represent exactly? Go ahead.

Josh Albrecht

One of the things that’s actually exciting to me about this theory work more broadly is: as we take these limits and simplify things, we start to have tools — we start to clear things away — and it becomes easier to build up, create larger systems, ask better questions. And this representation question is a perfect example of the kinds of things that could have a really big impact. If you really know what a representation is, if you can really answer what similarity means, it might suggest immediately: well, we should do this in a totally other way, which could be a complete reframe — much simpler and easier. We don’t know if that will happen. It’s possible we’ll do all this theory work and find out we can make things 10% more efficient, and that’s the cap. It’s also possible we learn something where it’s like: no, we were just getting started and doing this totally wrong. And there could be a huge shift in how we think about these things.

So for me, one of the really exciting things about this work is that it potentially has these kinds of new things we can build with our understanding, and new questions we can build on top of that can really shift what we’re able to do and what kinds of questions we can even ask.

Jamie Simon

This leads into something I think about a lot: the idea that science is an edifice that builds on itself brick by brick. We’re not going to solve the puzzle with one brick. What each paper, each contribution, each project should try to do is lay down one humble but very solid thing that can support the weight of the building we’re constructing. And there’s this sense — which metaphor is it — a rising tide lifts boats and allows you to reach higher bricks, because rising tide lifts all bricks and you can reach higher fruit from standing on your boat. Dan makes fun of me for my metaphors.

Daniel Kunin

Jamie is a metaphor creator.

Kanjun Qiu

Do you have any preferred way of thinking about representation similarity?

Daniel Kunin

We think that neural networks take their data and build progressively richer representations of that data — sometimes we call those features, sometimes representations. And through that process, going layer by layer, they’re eventually able to take that final representation and maybe predict the next token. So they’re building up, layer by layer, a belief about what might be the most likely next token.

When I think about comparing representations, I’m actually more interested in: what precisely about the data, what features of the data, are they actually extracting? So the question of comparing models has a lot to do with understanding the data itself. One of our open directions is: how should we model data? What is a good model of data? And I think answering the question of how to compare models will inevitably require an understanding of how to think about the features in data.

Josh Albrecht

All of these answers — if you answer that question, it helps answer this other question. It helps you assemble these bricks. One thing I’m excited about is all the different implications of this work. Do you want to talk about some of the implications? Let’s say we make progress. We make some more of these bricks. How is this helpful to the field more generally, to other fields?

Kanjun Qiu

Or even today — are there practical things you recommend practitioners do? Either version of the question works.

Daniel Kunin

We opened this conversation talking about learning mechanics as a kind of physics underlying deep learning — in the same sense that mechanistic interpretability is a biology of deep learning. Talking about how these understandings can actually influence other fields: I think there’s going to be a symbiotic relationship between the mechanistic interpretability community and the deep learning theory or learning mechanics community — and it goes both ways.

Learning mechanics and the ideas in this paper, and the ideas I expect will come out of the open directions, will have a big impact on mechanistic interpretability. Formal definitions of what our features, representations, and circuits are — these are words used in both communities that can mean very different things. Coming to a consensus and defining these formally, from first principles, from both the bottom-up and top-down approach, will have a huge effect.

And vice versa: the mechanistic interpretability community has always brought data to the forefront of every problem they study, and that’s something the theory community of deep learning hasn’t always done. Sometimes we treat data as just X-Y pairs — input and output — and focus on the optimizer, the architecture, the nonlinearities. We get drawn toward problems where we want to use interesting hammers: stochastic differential equations, neural tangent kernels. But sometimes the data comes last. And data is probably the most important piece of this puzzle. Mechanistic interpretability has found all these interesting insights by looking at the interaction of data and architectures and neural networks. Theorists who take those insights seriously and really think about them as goalposts to understand are going to make big impact in our field. I think there’s going to be an amazing symbiotic relationship there.

Kanjun Qiu

Very generative for problems to work on. For each of you, what are your current obsessions or interests? What open question are you most interested in solving right now?

Jamie Simon

The two open directions I’m most interested in right now are Open Directions 1 and 2 from this paper: simple models of nonlinear feature learning, and theory that’s data-aware.

Kanjun Qiu

If you were to describe that as “I am so puzzled by X” —

Jamie Simon

Yeah. Okay. So here’s the deal.

Josh Albrecht

What is the open question? What are you actually puzzled by?

Jamie Simon

We have these beautiful, insightful, solvable models. There are two main workhorse models. Daniel touched on deep linear networks earlier — they have these stepwise learning dynamics, learning single directions one by one. There’s another class of models called kernel methods, specifically kernel regression, which is connected to deep learning not by killing the activation functions but by taking a certain infinite-width limit — not the upper limit, the simplifying one — and you get kernel regression out. You can also solve this, and it’s a beautiful theory of learning. It can actually learn functions that are fully nonlinear in the data, except the learning dynamics of the network are linear. So it’s like a linear function learned after a nonlinear projection. This also shows a simplicity bias, and we have a complete theory of generalization that works really well.

Kanjun Qiu

It just doesn’t apply that well to anything realistic.

Jamie Simon

Right. We have another solvable case that comes from a simplifying assumption on a deep neural network, but again we’ve thrown away an important form of nonlinearity — in this case, the nonlinearity in the parameters. You know, we want to know what’s at the intersection of these two things. I want to write down a model that is nearly as tractable and insightful as these two, but gets the best of both worlds. I want a model whose dynamics I can solve and study and understand better than even a shallow MLP — a multi-layer perceptron, the simplest neural network. Even that, with its nonlinearity, is too complicated, too annoying, in my view, to do complete end-to-end trajectory studies. So I want something simpler than that, but that can still learn fully nonlinear functions and still has fully nonlinear dynamics.

My team and I have identified a function class of this sort. We think we know how it’s related to MLPs. We can solve its dynamics in a large number of cases. We can predict things about how MLPs learn from this, and it seems to work pretty well in a large number of cases. And right now I’m pretty obsessed with getting that to work out, because having a good nonlinear model of the dynamics of feature learning would just immediately let you ask so many new questions where the complexity is pared down, but you know it’s still capturing something important about the system you’re studying. That feels like a huge unlock.

Kanjun Qiu

How about you, Daniel? Like, I’m so confused by X, or...

Daniel Kunin

Sure, yeah. This is maybe not something I’m so confused by, but maybe the arc of my research for the next year or so.

Something Jamie and I worked on together about a year or two ago: as we’ve talked about limits, there’s a certain limit I was really interested in — the limit as we take all our parameters to the origin. If we actually set all the weights and parameters to zero, the input-output map of that network would be nothing — zero. It would never train, because the gradient of one parameter is always dependent on another. This is essentially a critical point of the loss landscape.

If we perturbed slightly off of that critical point, eventually with enough time, the model would learn something. What has been shown before in other settings is that the learning dynamics of networks in this vanishing initialization limit start at a saddle point at the origin and progress through a sequence of saddle-to-saddle dynamics — jumping from one saddle point to another, eventually going down toward a global minimum. In some simple models, you see this in the loss curve as plateaus followed by drops, plateaus followed by drops. Each one of these jumps from one saddle point to another is learning some aspect of the task.

Jamie and I worked on a paper where we tried to unify a bunch of existing saddle-to-saddle dynamics under one picture. We were training this model through gradient descent, but maybe there’s actually an alternative, discrete optimization process that would describe that process, where each step of this discrete process is actually learning a feature. If we understand that process, we’re also understanding something about the features the network is going to pick up. We also looked at work from the mechanistic interpretability community around modular arithmetic and how neural networks learn Fourier features. We proved that using our framework, we could show why these Fourier features would emerge in that setting.

That put me on this trajectory for the next year: looking at all these different algebraic tasks. Modular addition can be thought of as a task of group composition under the cyclic group. I got interested in these tasks where we know a lot about the underlying structure and know what the right features are — and trying to understand how, through training, a neural network picks up or acquires those features.

The hope here is that by really deeply understanding that process, even in tasks where you already know the solution and the right features, you’ll understand that acquisition process — and that process will transfer to settings where we don’t know what features a network is learning.

For example, take a next token prediction task. I might think about generating data from a hidden Markov model or some sequence generating process where I know the underlying probability distributions and what would be optimal — what an optimal learner would do if tasked with next token prediction. I would know the right features to do that task.

Kanjun Qiu

So we should be able to, in theory, predict what features it’s acquiring.

Daniel Kunin

Yeah. Roughly. It might not be predicting a priori. It might just be an alternative algorithm that, if you run it on the data, would give you the same solution at the end — but this alternative algorithm would be more interpretable.

Kanjun Qiu

Interesting.

Daniel Kunin

So I think this idea of using synthetic datasets that have a lot of structure that still require nonlinear learning, and studying how neural networks learn the structure in those synthetic datasets, is a really promising direction.

Josh Albrecht

That’s a really good example. To zoom out a little bit and ask about the broader implications: you can imagine bootstrapping from modular arithmetic to, okay, what about long-form arithmetic where you carry the one — which is like a language, something you could see if someone’s explaining arithmetic — and you can imagine making it richer and richer to the point where it looks like reasonable text. You have a full model of what features the neural network is learning, how it’s actually making these decisions, how that changes at scale. We’re clearly not there today, but this is the kind of thing that seems really exciting, because once we do have that level of understanding, we might be able to talk about this stuff sensibly at a policy perspective, an engineering perspective, something that connects to real life. But until we have the tools and fundamentals for this, it’s hard to say anything at that level.

Jamie Simon

I’m never quite sure how much to make this a central motivation for what I do. I feel quite confident that some form of regulation and policy of AI will be necessary. And we haven’t really tried many ways of regulating AI yet, I would say.

Things are changing very fast. But one possible future, one possible way things play out, is that having a language to talk about these things in terms other than their raw statistics — number of tokens trained on, FLOPs for forward pass — gives us tools to better describe, regulate, characterize, and have a sober public conversation about these systems.

Daniel Kunin

Being able to attribute data to the learned model — to understand the influence of what may be one set of data points or a certain corpus of texts on the final trained model, or on the learning process — would be a pretty important tool in the conversation around regulation, copyright infringement, and other things like that. Really understanding the influence of data is critical not only for deep scientific understanding, but also for the practical and regulatory frameworks that will eventually be needed.

Josh Albrecht

There’s also a safety angle. We’ve used the bridge engineering example before — I think it’s hard to make many grounded claims about safety without really understanding the system you’re talking about. If you just have a black-box understanding of bridges, it’s like, “well, it didn’t fall down yet.” We can do so much better once we really understand bridges, engines, planes, nuclear reactors.

Jamie Simon

Right. We say in the paper that we believe there are three types of reasons to want a theory of learning mechanics, and it’s important for three reasons. One is fundamental scientific reasons: understand intelligence, understand deep learning — there are big scientific mysteries here. One is practical reasons: there are many things we’d like to do with deep learning that we could do much better with an explanatory theory. The last is safety reasons. Presumably if we want to have — that’s the one setting where we would most like to have guarantees and understanding. It’s not super critical to have very reliable chatbots most of the time. But certainly, if we’re in a scenario where very powerful AI systems are making high-stakes decisions in real time — for example, running your life — like, we need to be able to appeal to some kind of grounded way of thinking about these things. This is also a response to the common objection: “Oh, well, these things will understand themselves before you ever understand them. So why are you trying to do this?” This is a concern for human intellectual endeavors across the board right now — far from unique to us. But I think there’s a unique answer: setting aside the fact that we already understand some things and they’re being useful, isolated pockets of understanding are useful even without the whole thing.

Kanjun Qiu

And it would be nice to use these systems to understand these dynamics in order to design better versions of these systems. We don’t only want to evolve systems that we rely on.

Jamie Simon

Yeah. Even setting aside those answers to the objection “you won’t get there before they do” — the safety and oversight case is really the one setting where, unless you trust the AIs to police themselves, you don’t want to totally hand over control. Having some kind of fundamental theory gives us a foot in the door.

Josh Albrecht

It seems hard to bootstrap into a place where you have control, oversight, safety, and understanding without this, basically.

Kanjun Qiu

On that note, for everyone listening — for those who are grad students — is there anything you’d recommend they do if they want to participate in the field, work on one of these open questions, or get involved in the community?

Josh Albrecht

Reach out to you? Cool blog post? Send you some money? What should they do?

Daniel Kunin

You could see this paper from two different angles. One angle: we basically have three claims in this paper — there will be a scientific theory; there are pieces of this theory starting to emerge; and this theory will take the form of the mechanics of the learning process. So you could think of this paper as essentially our justification of those claims. Or you could also look at it from a pedagogical point of view. Reading this paper and looking at the citations — the papers we cite and the stories we highlight — is basically 14 authors in this space describing what we would call a great intro course to understanding deep learning theory.

Kanjun Qiu

This is like the textbook for understanding deep learning theory right now, where we haven’t written a textbook yet because it’s so new. We don’t have the answers.

Josh Albrecht

Yeah, it’s the syllabus to an intro course.

Daniel Kunin

So I would say: read these different things and think, as if you’re a young researcher, which of these different approaches or methods of handling complexity appeal to you and your thought process. Then go deeper into that, reach out to the authors aligned with those approaches, and think about what open questions and open directions we posited.

Jamie Simon

Yeah. The last section of the paper is called “How to Get Involved in the Development of Learning Mechanics.” There we give a list of tenets of advice for newcomers to the field. Things like: try a few problems before going deep into one. Value scientific insight and understanding over the difficulty of the theorem you proved. Do lots of experiments because they’re cheap and easy. I’d recommend that a young grad student look at both that section and the open directions section, and then read the rest with an eye for trying to hear and disentangle the stories we’re telling from this broader mass of literature, and see what reaches out to them and compels. We’re also putting together a website — at time of recording, the URL is learningmechanics.pub.

It’s meant to be linked to in this perspective paper, and we’re hoping it will serve as a place for high-quality perspectives from experts, pedagogical materials, open questions, and maybe even a forum for discussion.

Kanjun Qiu

Thank you both so much. This is really fun, and I hope that many more people who are wonderfully curious and qualified will join the field and will actually develop a theory of deep learning.

Attention as an art form

Ashley Zhang — Sat, 11 Apr 2026 14:31:07 GMT

This is the recording of our second Art of Being Human event with philosopher Adam Robbert. Adam is a philosopher by training, and a writer, editor, and researcher by vocation. His focus is on the relationship between practice and perception in the fields of philosophy, religion, and contemplation. He writes regularly on his newsletter, The Base Camp.

A summary of this conversation can be found here:

The Art of Being Human is an event series by Imbue for the exploration of the shared human questions in our technological age. Sign up to receive invites here.

KANJUN (OPENING REMARKS)

Hello, friends. If you’re standing, please get your food and come sit down. All right. Well, hello, everyone. Welcome to Imbue. My name is Kanjun. I’m the CEO and I’m really excited to have you all here for the second event in our Art of Being Human series.

I’ll talk a little bit about Imbue first. Imbue is a fairly radical AI company. Our mission is to make tech serve humans. By tech, we mean tech like the software we use and also the technology industry. And by serve, we mean that we want technology that serves us and is not exploiting us. Today, we live in a world where we are kind of exploited. I don’t like to put my phone in my room at night because I might end up scrolling Instagram till two a.m. Not my fault — somebody else is trying to hijack my attention for their own profit. We have this world where we have devices, and on those devices is a lot of stuff whose incentives are not fully aligned with ours. And we believe that as AI agents become more powerful, this exacerbates the problem. Agents are getting to a point — we think at the end of the year — where they’re starting to make decisions on our behalf and starting to represent us. And in that world, we want those agents to fully serve our interests, not the interests of others.

So that’s a lot of what Imbue does. We are basically trying to build toward an open agent ecosystem to combat monopoly power in tech, because we think monopoly power and centralization is what allows for this kind of exploitation. Our goal really is to increase the percent of people in the world who can use agents that are fully aligned to them, where your agents are fully expressing your own values and you own them — they’re yours. Maybe your data is local and yours. And over the next few months, we’ll be shipping a lot of tools toward that.

That’s one way we build toward this kind of future. But a second way we do that is by figuring out what it means to serve humans — what it means to be human in this world of very powerful technology, technology that can think. We thought that was our advantage, right? So the conversation today is about attention. Attention is one of our most precious resources as a human. I want to introduce Ashley, and she’ll introduce Adam. Ashley is our Storyteller at Imbue — she really does a lot of investigation into what it means to be human and figures out what we should stand for. She has the unique property of writing pieces where almost every piece she writes, I cry. So if you want to be crying, you can subscribe to her Substack. It’s called . Without further ado, Ashley and Adam, welcome.

ASHLEY (INTRODUCTION)

It’s so wonderful to see new and familiar faces. This is the second of the Art of Being Human series, which I like to think of as a shared collective investigation into these human questions in our technological age. And I’m so excited to be here with Adam today, who is a proper philosopher and a writer and editor. He’s written this wonderful book, Practice in Still Life, a collection of fragments, essays, and lectures on the topic of attention and perception, introducing various traditions of thought.

Our first event was last month with my friend Nicholas Paul on the topic of thinking, which I thought flowed well into this topic. Someone actually asked a question about attention at the end. I think a lot of what we talked about was: how do we create the spaciousness and conditions to allow for expansive thinking? Thinking that is beyond the calculation of machines, thinking that is truly novel and expansive and builds upon this great project of the humanities and being human.

I’m excited to chat about attention today, which I think is one core aspect of thinking and also just being here in the world and finding your way through it. Adam has a very particular way of conceiving of attention that goes beyond a resource to be used, captured, and exploited — something that actually puts the power back in our hands as a practice, or as an art form, that we can cultivate in our everyday lives.

Conversation

ASHLEY: How did you get interested in attention as a topic? What is your perspective on it?

ADAM: Thank you all so much for being here. And thank you for inviting me. It’s a beautiful space, and I’m so glad we can all come together here to have important conversations like this one, in the heart of the city where a lot of the technologies we’ll talk about are being created and shared throughout the world.

My background is in philosophy, fairly broadly construed. I have a particular view on the philosophical tradition that opens out into areas you might call spiritual exercise, contemplative practice, various religious traditions, spiritual traditions, arts and religion — basically taking a full sweep of the humanities, but anchored in philosophy.

I started to look at philosophy first in the same way that most people do — as a tradition of texts and arguments and concepts, propositions, logical statements, things that you would debate over and argue over, the realm of reason and rationality. And I think that’s a very important layer of philosophy. I just don’t think it’s the whole of what philosophy is and does. As I started to deepen my inquiry, learning more about the deep tradition of the history of philosophy, I started to become aware of the reality that these statements, these arguments, these worldviews, these big philosophical systems were actually primarily grounded in a set of practices — practices that if we described them today, we would think of as spiritual exercises, meditative exercises, contemplative practices, more so than what you would get in a philosophy 101 survey course. But it became apparent to me that these practices were actually the medium or the vehicle by which philosophers came to the insights that they shared with the rest of us.

As I was looking over all of these practices, I realized that one set of practices felt more central than the others — or a lot of the other practices were actually based on the idea that they would support this central practice. And that practice is a practice of attention. I started to realize that one of the key philosophical moves or attitudes is this question of attention: what happens when we learn how to cultivate attention? What happens when we start to think about how our attention is shaped? There’s this idea that attention is just a flat, singular thing, but it’s not. My view is that attention is a very unique thing, a very particular thing, and it’s very rooted in your habits and practices. That’s how I came to this phrase: attention is an art form. It’s something that you can shape deliberately, on purpose. And for me, that became kind of the center of the rest of philosophy.

ASHLEY: What are these practices that you’ve come to understand as crucial to cultivating attention?

ADAM: There’s a famous story in Plato’s Symposium where the description we get of Socrates is that he’s there going to this symposium, this party — it’s a drinking party, they’re going to have a good time, they’re going to discuss philosophy. And Socrates kind of lags behind. The description we get of him is that he’s lost in thought, or in a sort of meditative trance. We hear from other people who knew Socrates that he would do this from time to time, sometimes for hours on end, just uninterrupted, silent meditative states. But when you look at the Greek, the most literal translation is that he turned his attention to his intellect — he turned his attention on himself, he turned his attention to his thinking. So this is kind of at the core of the Western philosophical tradition as we learn it from Plato: this whole idea of knowing yourself isn’t just about knowledge, not just understanding you as a biographical being, but this practice of turning your attention onto yourself in this meditative state.

If you look at some of the other practices — and there are many we could discuss — some tend to be more physical. There are practices of fasting. Fasting is something that shows up again and again in these traditions. There’s something about our relationship with food and how we take care of our physical body that’s very important. There are other connections with physical training. If you think about the physical setup of these ancient spaces, they were built in gymnasia — literal places where you would train your body physically. This was thought of as happening right alongside the philosophical inquiry you would find in the dialogues. So there’s this relationship between the fasting and the training of the body and the philosophical practice, all coming together in this sense of: how do you perform these contemplative maneuvers, this turning of your attention onto attention or onto yourself? And what are the supporting practices that help you do that?

If you move ahead a little bit into the late Roman and early Christian traditions, you get similar ideas: you live in a world full of distractions, a world that’s calling for your attention in different ways. Some of those things are good and virtuous to pursue, but a lot of them are leading you down the wrong path or encouraging the wrong thought patterns. So in those contexts we find practices of withdrawal. Think of monastic monks — they leave the city and go into a monastery. If you think of the gymnasium as designed for a certain kind of physical training, a monastery is designed for a certain kind of spiritual training. The whole space is designed to free you up from some of those distractions so you can focus on contemplation.

I’ll add a note: in a lot of the traditions, we also find that leaving is important, but the returning is just as important. Socrates is constantly talking about leaving the city and coming back to it, leaving the cave and going back into the cave. A lot of the practices are goods in themselves — the fasting, the physical training — but the language in some of these traditions is that they guard the stillness you need for contemplation. It’s in those moments of contemplation, of insight, that we then get some of these texts, some of the written words that come out of these traditions. But they’re impossible without the practices.

ASHLEY: I’m glad you brought up the question of returning, because we often have this sense that in order to live a more virtuous life, it requires a rejection of society, a kind of Thoreauvian retreat into the woods. But Plato and Socrates didn’t live in the time of TikTok and Twitter. Today, even beyond our devices, we walk outside and there are cars, billboards, so many things calling for our attention. Do you think we have to settle for this kind of compromised state — that we can guard our attention somewhat, but really, if we’re living in urban modernity, we have to accept some degree of constant distraction?

ADAM: Yes, I would think about it this way: our unique version of those distractions is particular to us, particular to our technologies, particular to our moment in time. But that struggle to maintain the attention, to maintain the practice, is present throughout human history. It is a perennial problem. We have texts of medieval monks — eleventh century monks living in their monasteries, stitching together their manuscripts — and we have commentary from these fellows to the effect of: I think these illuminated manuscripts are getting out of hand. The lettering is distracting from the text. It’s getting too visual. They’re having these debates about how this is robbing them of their attention.

Or think of Socrates in the Phaedrus, where he’s practically pulling his hair out about people starting to write. He’s like: this is going to end philosophy. People are going to just write things down instead of remembering them on their own. He’s looking at writing and going: this is putting us at real risk. We’re outsourcing something really important — our memory, our capacity for attention, our capacity to keep this sort of living set of insights within us and accessible to us without the use of technology. And now we all just accept that writing is part of the intellectual life.

You can see again and again that the shape and scope of the problems change, but the underlying dynamic is always there. And it comes back to this question of: what is the human being? And why would somebody like Socrates be concerned about the relationship between technology and what the human is? My view is basically that philosophy is not new to these problems. It’s the specific shape of the technology that is new. Maybe the level and scope is new, but these are problems that we’ve looked at before, things we’ve thought about before. We have a deep history of thinking about the human being in relation to technology. There are resources in the tradition that I think can help us today with our particular questions.

ASHLEY: The anecdote about writing touches on something beautiful and powerful about humanity — our ability to adapt. How do you think the act of philosophy, or thinking, or the practice of training our attention, will evolve as we incorporate new technologies and mediums and environments?

ADAM: If somebody tells you they know the answer to that question, that’s a little bit of a red flag. There are so many moving pieces in the technology itself. I don’t know if we understand what television has done to us — and that was a long time ago. We’re still trying to figure out the printing press, and there are new waves of technology coming after that.

Rather than trying to come up with a forecast or a prediction about some certain set of circumstances, the move instead would be to leverage this fact: humans are transformable through practice, and the practices give us new abilities. They give us a kind of facility. If you think of this athletic metaphor again — you’re training different kinds of agility, but instead of physical agility, you’re training a mental, spiritual, contemplative, and emotional agility. And I think that’s going to be a better approach, because you don’t know what the world’s going to look like in ten years. You don’t know what it’s going to look like in six months. But you can train your agility. You can train your lucidity. You can keep hold of your attention. You can return to the practice of attention: what’s happening, what’s going on, what’s important, am I being led down the wrong path here? Can I pull myself back?

That kind of withdrawal we were talking about — monks going to a monastery — can take all kinds of small shapes, all kinds of small maneuvers. You can have a little space in your home, a little space in your office. It can be a kind of sacred space, a contemplative space. The word contemplation — that suffix, templation, is the same root we have in the word temple. It’s literally marking out a space, a clearing, for that practice. And what do you do? You wait, you practice. You wait for insights. You don’t necessarily sit there thinking you have all the answers. You give yourself some time to breathe, some space to do it.

Just in terms of this question of the human being: one universal fact that I think is true of all humans is that we are, in some important sense, open-ended. That’s why we have education, why we have these different cultural traditions passed down from generation to generation. If you look at a horse that had just given birth, the foal has been out for literally a couple of hours and it’s already running, already galloping — it kind of knows what to do. Humans aren’t like that. We are open-ended. We come out and we need a lot from the outside. We need a lot from the culture, from the tradition, from history, from our communities. But we also have this curious quality of being able to shape ourselves through practices. And that open-endedness is precisely what allows us to develop into the kinds of people we want to be — but it’s also the open-endedness that leaves us vulnerable to this kind of capture that we started with. There’s something up for grabs. We’re not set. We can be directed in different directions by algorithms, by news media, by politics, by propaganda, by all of these kinds of things. But it’s because we’re open-ended, and the practices are what give us the agility to make sure we’re going in the direction we think we should be going.

ASHLEY: I love this concept of open-endedness, because I think — especially in San Francisco — there’s a collective anxiety about becoming obsolete. There’s this sense of resignation every time there’s a new technical capability or a benchmark we set for ourselves gets crossed. But our lives are so expansive, and it feels like such a disservice to think of life as a series of benchmarks to hit. There is that anxiety, though — when there are so many possibilities open to you, you can train your attention anywhere. How do you know what is worthy of bestowing your attention?

ADAM: In some sense, that is the question. My response would be something like: I have a belief in human beings that we have a sort of innateness to us, an innate calling towards something. If you look at the word philosophy, it means love of wisdom — not knowledge of wisdom. It’s philosophy. And the reason for that is because in the love, there’s a kind of desire, an impulse, a longing of being drawn toward something that’s guiding you. You don’t know what it is. There’s a mystery there. But you have some kind of an intuition, a kind of a conscience. Maybe if I follow it, something will happen. I think that’s a good place to start.

The other thing I would say is that we are not the first people who have asked: what is actually good to do? What is the good? What is goodness? What is virtue? The world traditions have many answers to these questions, and they tend to collect around a series of ideas and concepts and practices that can clarify why these teachers said what they said. So I think part of it is having some faith that you as a human being have a conscience that can tell you something about what to do, and that you actually have a community — both here and in history — that can help you navigate that.

I don’t think those two things are actually that circumstantial. If you look at history, we’ve gone through many phases of what today we would call existential threats or crises, real turns in civilization. And people have thought about it. They have answers to these questions, or at least they have attitudes or stances that will help you navigate them. I feel like we’re in one of those moments right now, with this rapidly expanding AI technology. But having to deal with new and novel things isn’t new. We’re actually quite good at that. And as long as we’re thinking about it, we tend to be quite good at the things we can see coming. We think about them a lot. We navigate, we adapt. Practice is what’s going to make you more adaptable and also more able to transform the thing that’s coming into a shape that might be more livable.

ASHLEY: You’ve touched on a few practices — walking, writing, meditation. Can you go deeper into that? And also both individual practices and more relational or communal practices?

ADAM: One thing I’ll say about practices is that I think we’ve gotten into a mode — and I’ll speak for myself, this is very biographical — where we’ve lost the cohesive community context in which these practices used to live. In history, when you think about religious practices, philosophical practices, spiritual exercises, these were things done in a fairly cohesive community. If you were born in fourteenth-century England, you were probably going to be born a Catholic into a family of Catholics, and the whole social system was organized around these sets of practices. Time took on a different shape — there was a liturgy to the way the year processed. There were different things you did at different times of the year, and everybody around you was following the same calendar. You were kind of synced up.

We don’t really have that experience — a lot of people in the Bay Area don’t have that experience — because for many good reasons, we decided that politics and governance should be rooted in the primacy of the individual. There are all these gains from that. But I think what we lose is that sense of collectivity. And so one of the things I think we’re groping toward right now, dealing with these larger existential crises, is that our individual, idiosyncratic practices aren’t quite enough to get at the problem. The question of how AI technology is going to change society is such a big one. You might go crazy if you just try to address it as an individual subject doing your own inwardly facing practice. So rather than focusing on specific exercises, I would think more about: what’s the collectivity here? Does something change when we perform these practices together, when we have these discussions together, when we do this kind of group, collective activity of thinking about these problems together? I think the equation changes quite a bit.

ASHLEY: You spoke earlier about the temple and creating a sacred space where contemplation can be practiced. What are the conditions that allow for this kind of collective practice to emerge?

ADAM: There’s an institutional component, and a component of lineage. There’s a physical, architectural component. We’re lucky to have spaces like this one where we can come together and talk. This affords us a different kind of interaction, a different kind of grip on the problem, by coming together. So I think architecture and design — including the design of technological artifacts, apps, and the way we engage with media — there’s a lot of opportunity for designers and architects and people thinking about arts and aesthetics to help us re-envision new forms of collective practice and collective exercise. And we don’t have to reinvent the wheel. There are good examples in history for how to do this.

The one I think about the most, coming from an academic background, is the university system. I’m very much looking at: what has the university become? Is it really fulfilling its mission of giving us a unified understanding of both what the universe is and what the human being is — how the two relate, how we should act, what we should do, what we should care about? Are they really fulfilling that mission? Especially in the humanities, we see a lot of decline — a lot of decline in enrollment, departments closing. There’s a whole generation of tenured professors retiring that I don’t think are going to be replaced.

But if you look at history, the roots of philosophy in the West have only an incidental relationship to the university. There’s about fifteen hundred years of some of the most influential philosophical activity happening before the university even comes on the scene. So there are different ways of thinking collectively, different institutional ways of arranging these things where we can do this as a group, with some rigor, with some historical influence, but also with some novelty appropriate to our times. The technology is also making new things possible. There are things we can do now that we couldn’t have done without it, in terms of connecting the right people and sharing and disseminating ideas. So there’s a lot of optimism, honestly, from my perspective, alongside all these dangers.

ASHLEY: Can there be a collective organizing or coordination of attention that isn’t top-down? Whenever I think about this, my sense is there’s always some authority figure who’s like, this is what’s good, this is what’s important, this is what we should be valuing and thinking about. But can there be a more bottom-up, democratic sense of collectively figuring out what is worthy of our attention? How do we create the conditions that foster that?

ADAM: Bottom-up is the approach, and even bottom-up and regional. San Francisco probably needs something particular to San Francisco. New York needs something particular to New York. There’s this sense that where you are in space matters — just as time has a qualitative characteristic in some of these traditions, place has the same. So it makes more sense to think about what a bottom-up, ecological approach to institution-building would look like — one that would do some of the things you’re describing. What we’ve been trying is a top-down, monoculture, universalistic approach to institutions, especially if you think of universities. There’s a lot of top-down energy to them, and I don’t think it’s serving their mission very well.

ASHLEY: I’m curious for your take on the attention economy as a concept and attention as a resource. Do you think that’s a helpful or accurate framework, especially as we’re trying to wrangle our attention from so many distractions, devices, and things calling for it?

ADAM: It’s really important to think about the kinds of things you’re paying attention to, because the kinds of things you pay attention to become part of who you are. There’s an important piece here about memory. The reason that practice works is because you have this capacity for memory — not necessarily just memorization of facts. Human memory doesn’t work like a file system on a computer. You can upload any kind of file to a computer, and the hardware doesn’t reorganize around the content. But you do. Your memories actually reorganize your sense of who you are as a person, which is why we sound like we come from a certain place, why we pick up the language. Your memory is reorganizing your day-to-day perception of the common-sense world — syncing with it down to the level of your physical sensations and physical perceptions of things, up to your more abstract intellectual thinking. So your memory is very important. It’s really important to tend to your memory and to think of it as a kind of living ecosystem, a living part of your perception, a living part of your acting.

There’s a woman named — I don’t know if you’ve read her, she has a great Substack and writes a lot about memory — and she created this metaphor that really stuck with me: if you accept this view of memory as a kind of living thing that is reorganizing you at a really fundamental level, and what you’re paying attention to and what gets lodged in your memory is this kind of industrialized, flat, repetitive landscape of inputs, then you’re kind of creating — she uses the metaphor of a desert — your inner life is becoming a desert. Your memory is in this process of desertification. But you can be the gardener of that. You’re in charge. You can read different books, listen to different music, have different conversations, connect more meaningfully with other kinds of people, create these other kinds of memories and re-enliven that inner garden. And that’s going to transform your perception.

If you think of it that way, and then you think about what you’re doing in the attention economy — you’re kind of monocropping your inner life with a certain set of content on social media. And I think basically every human being, even spiritually advanced people, struggles with their phones. There’s something about it that is really addictive. It’s really designed for us to use over and over again. And there are uses there — I find a lot of interesting things on the internet, I find interesting essays, I solve a lot of problems out there. But it comes back to the intentionality: is this serving me, or is it serving the company who’s trying to sell me ads? You have to guard that. You have to protect that. If you feel susceptible to it, engage in some of these practices: try to make yourself more lucid and agile in those moments, and then do simple things. Put your phone in the other room. Basic withdrawal activities. And don’t underestimate just the amount of money and resources that’s after your attention specifically. All of those algorithms are so honed in to you as a person. You need to be careful. You need to guard it. You can change, and you can change your relationship to it.

ASHLEY: Simone Weil, the French mystic and philosopher, had this saying: ‘We have to try to cure our faults by attention and not by will.’ When I first read it, I thought it was nice and then I chewed on it. I think willpower is closely tied with attention — at least in the beginning, when you’re trying to train or attune your attention, especially if you’re resisting these external forces. It feels like the will is attention in some way. What do you think of the relation between the two?

ADAM: Simone Weil is a fantastic person to think with about these questions. If you don’t know her work and you’re interested in anything related to what we’re talking about, she’s a great writer and thinker to get familiar with. The other person who also writes about Simone Weil is another Simone — Simone Kotova — and she writes exactly about this paradox or contradiction. She has a book called Effort and Grace, and it’s exactly about this relationship: the practices have something to do with effort, with willpower, with your desire, your force, the repetitious nature of your disciplined activity. But on some level, that’s not really enough. That’s not really the deepest part of the practice.

The deepest part — and I think contemplation is different from attention in this way — is where attention is this kind of concentrated, one-pointed fixedness where you’re really focused on something, but contemplation has more of this character of letting go and letting be. Giving up the project of willful change and just sitting. Sitting in what? Sitting in silence, sitting in receptivity, giving up the project of seeking and just seeing what happens when you give space to not seeking — seeing if in that space of grace, something else doesn’t emerge, if things don’t show up to you differently.

A lot of these practices, especially thinking of attention as an art form, are really about how you get things to show up for you in your first-person experience. We don’t all share the same physical experience even of the physical objects in the room. Everything has to do with our training, our knowledge, our education, our experience. Things are showing up differently to each of us. Some people have great expertise at, say, how to design a space — physical space is showing up to them in a particular way because of their attention practices. So there are all these ways in which you’re trying to train your attention to see things from another angle, to see another level of detail that other people aren’t getting, or to understand or interpret a kind of meaning that other people might be missing.

But then there’s this other move, this contemplative move, which is more like: I’m going to stop trying to figure out and understand. I’m going to let go. And actually in that space — that’s typically, for me at least, where the thing I was looking for kind of shows up. But there isn’t really a program for that. There isn’t a way to make that happen on command, but you can give it space to happen. I think this is what Simone Weil is talking about. She talks a lot about waiting — waiting for years, sometimes. And how central that is to a deeper kind of philosophical insight. That might be one of the essential moves in a time where there’s never not something to pay attention to. Withdrawing from that and just sitting. Sitting in the silence. Sitting in the emptiness. Sitting in the darkness. There’s no goal. There’s nothing else on the other side of it. And then, as Weil describes, some interesting things might happen.

— Audience Q&A —

AUDIENCE: Thank you for the educational talk. If someone were to ask you, beyond name and form, who are you — how would you answer that question? And secondly, knowing who you are, what would you define as surrender?

ADAM: Beyond name and form. That’s a good example of a question that you could just sit and wait with. The philosopher in me wants to say something like: I am a space in which meaning emerges. A certain kind of understanding takes shape through the activity that is me, and that shape and that understanding has a responsibility and a uniqueness to it — as an expression of the rest of reality, as an expression of the rest of the universe. As a human being, we are a unique kind of opening where we can even ask and reflect on that question. And I think we are in some sense responsible for how that makes us act and how that affects other people and the earth as a whole.

ASHLEY: I would say something similar, but perhaps less poetic: just the collection of kind of experiences, but also histories and stories along the lineage long before you. And to the question of surrender — I believe that we have free will to some degree. We have our willpower, our agency, and that’s very valuable and something I care about. But part of surrender is being like: whoever you are is shaped by forces beyond your control.

AUDIENCE: We’re at an AI company, and I wonder: when we use terms like attention, perception, contemplation, consciousness, thinking, judgment — is that something we should even consider attributing to large language models or digital computation? Because when I think back at prior media technologies — whether it’s Socrates worrying about writing or the printing press — no one was worried that books could think or perceive or were conscious on their own. And now we’re tempted to imagine that these digital technologies actually have attention, make decisions, are agents. How would you think through those questions?

ADAM: Those are open-ended questions, and I think the answer you would give would probably change maybe month to month, year to year. That whole packet of terms — judgment, thinking, decision, attention — I think we’re anthropomorphizing the system to a large degree. There’s a great essay by Yuk Hui on LLMs — maybe two years ago now — basically still thinking about them along the lines of a human prosthetic. They’re an extension of us, and will likely remain an extension of us. So we’re still doing the judging, the directing, and the attending, and they are executing on that. I don’t have one hundred percent confidence that that’s true, but I do think that in order for something to be a judgment in the same sense that a living human being makes a judgment, it has to have a point of view, and it has to have a kind of self-awareness. And I don’t see that in the systems we have today. And I don’t see how scaling up these systems will by itself create that.

ASHLEY: I don’t have a robust answer to whether it’s accurate to apply these terms to these systems. But what I can say is: I’m glad that this new technological wave has raised these questions and inspired us to examine these terms more closely. There’s this sense that before, we kind of took things like thinking and agency for granted — we all participate in these activities without really, mostly, inquiring as to what the purpose of these things are, and why we engage in them. And I think now, in this anxiety to try to differentiate ourselves from machines, we’re being pushed to understand them better. And also to understand ourselves — what we do, in what ways it’s different from the ways machines work.

KANJUN: One way I think about agents is that they are actually taking on a lot of the things that humans do today, and they do make judgments. They have a point of view, quote-unquote. But that point of view is not necessarily determined by them — it’s determined by someone or something else. You could say that maybe all humans don’t have a point of view that’s fully determined by us, but we have a kind of open-ended process where, from our perceptual experience, our own point of view will evolve. And that’s something agents are missing today. That doesn’t mean that in three years they’re still going to be missing that. Continual learning and open-endedness — it’s totally possible that they will have some of these properties that humans have. They don’t seem to have this thing we have: this awareness, attention, workspace where we’re recombining things. But maybe that’s just an experience we have and they have a different experience. It’s hard to tell.

AUDIENCE: Do you think our experience with LLMs and ML models would be a sort of attempt at self-recognition — the same way that we do when we meet someone else and try to make them see ourselves? But those models are actually only mirrors, and we are confusing them with subjects. And do you actually think that we can create semantic mathematical models to understand how attention is harnessed, to reverse the effects of it by creating some sort of semantic model of subjectivity, of the way we make sense of reality through meaning? Do you think this can be formalized and implemented?

ADAM: I’m skeptical. A living organism, I think, is organized in a particular way on purpose, such that it actually does reflect a kind of underlying order in the universe that gives rise to it and maybe even beyond it. And the way that, as I understand it, the technology is being organized — if it does create something like that, it’ll be along a completely different line. I don’t think it’ll be like us. But I do think that as it stands right now, when you’re interacting with these platforms, that is what you’re doing: you’re getting this kind of reflection back to you based on your inputs, based on your inquiries. And in some sense, the question of the algorithm on social media is transferring over into the way some of these LLMs are being designed — in order to capture your attention. They have increasingly good memory about your previous inquiries and where to guide you. But I think that’s all a reflection of you in a certain sense, in the same way that your social media feed is a reflection of some part of you. It’s not giving you the same connection you will have with another human being. I do think engagement with other humans is going to be the key thing, and I don’t think that’s replaceable.

ASHLEY: I do think broadly, technologies reflect the values of their creators and also the incentives that govern the values of their creators. That’s my high-level answer, but I’m very curious about this. I think there are some researchers in the room — if people have takes, find us afterward and we can chat about this.

AUDIENCE: These practices — whether individually or communally — are they sustainable or even real without a common object of love that the individual or the community possesses? And are we even capable of identifying what that is on our own in a way that will be sustaining and orienting enough to free us from the distractions around us?

ADAM: I’m glad you asked that. I do think in some sense, love is at the center of what pulls these practices forward. Attention is itself a question of what we love and what we care about and what we’re concerned about. Intellectual activity is often thought of as being centered in the mind, in the brain, in concepts and language and reason. But a lot of these practices, some of the words used to describe them in the scholarship, are cardiocentric. If you think again about the Christian monastic practices of contemplative prayer — Cloud of Unknowing is a great anonymously authored thirteenth-century text — it’s basically about sitting in the love of God. The reason for that is that there are these transcendental questions of ultimacy that knowledge actually can’t attain to. There’s something about knowledge that is perspectival, circumstantial, tied to a more empirical sense of reality. What goes beyond that — in the Cloud author’s language — is love. This kind of loving devotion is both what gets you to care about the practice and drives the practice, and is, if done right, what the practice is actually developing. The loving devotion is absolutely central. And that gets lost in our very overly intellectualized environment.

ASHLEY: That question reminded me again of a Simone Weil quote — love being the rarest and purest form of generosity. In that framing, it feels very abundant. And I think the words we use really shape the way we conceive of things. Right now the language we use around attention feels really aggressive. I often find that something like generosity — bestowing attention — feels quite beautiful in that way. There’s also the poet Mary Oliver, who wrote a lot about attention, and she has this one essay where she writes that attention without feeling is just a report. There’s something about what we’re called toward — what speaks to us — that’s a movement of our soul or emotions, and that’s often overlooked in these conversations. Love and attention. I think they’re so closely intertwined.

AUDIENCE: You said that memory drives a lot of where our attention goes, and naturally I would say that what you value — your virtues, your interests — is going to drive what you consume and what you store as memory. Does attention follow value, virtue, and interest? Or is it the inverse — that interests, values, and virtue are actually following attention?

ADAM: I think they have a complex relationship. What you value tends to be what you attend to. But what makes any of this philosophical is this act of trying to give words to these processes: how do my values guide my attention? How does attention transform my values? You can go through life without examining much of any of this — without a firm sense of why you care about the things you care about, or whether you should care about other things. All of this stuff takes on a new tone when you start to ask questions like this.

ADAM: The way to think about it: values are driving what you pay attention to. But as you pay attention and follow those values, your sense of how to judge and enact those values changes. This is Aristotle’s point — ethics is a practice, a habit. You practice by imitating virtuous people and that starts to transform who you are as a person. But as that happens, your sense of judgment increases, your sense of being able to pick out — if you think of perception in a physical sense, about the lights and everything in this physical space, there’s also moral perception. What is moral character? What is moral behavior? What is right in this particularly complex, concrete situation? That’s the kind of stuff that attention starts to bring to the surface if you’re engaged in these self-reflective practices. So there’s a reciprocality there: the values are transforming your attention and the attention is deepening your understanding of the values.

ASHLEY: This reminded me of the writer Jenny Odell, who wrote How to Do Nothing. She has this anecdote about over Covid, she would go to a local park and watch birds. Over time, she became curious about what types of birds they were and would look them up. This sense of attuning her attention to these birds and understanding them more deeply helped her see the world in a deeper, richer way. The more you pay attention, the more things come into your purview, the more details you’re able to see. There’s a cycle — the world, kind of unfolding itself to you the more attention you pay to it. There’s some reciprocality to it.

AUDIENCE: When we look at things that draw our attention today — like social media or AI — I always viewed it as the outcome of economic incentives. But when you mentioned that back then they were talking about block printing being too gaudy and grabbing attention, do you think we have an innate desire to create things that grab attention, or is it an accidental outcome of creating better technologies?

ADAM: I think a lot of our relationship to technology is inevitable, and so the question of how to steer it is actually the question. Let me share a very quick story — it’s a myth, Plato again. How many of you know Prometheus? Most people. How many of you know about Epimetheus? A couple. In the Protagoras, Socrates tells this story of Epimetheus and Prometheus — a story that the French philosopher Bernard Stiegler has made a lot of, and if you’re interested in the relationship between humans and technology, Stiegler is a great resource for this.

ADAM: Basically: Epimetheus and Prometheus are called Titans, given this task by the gods. Epimetheus is given the task of assigning essential skills or characteristics to all of the creatures on Earth, including human beings. He gives turtles a shell. He gives tigers claws and teeth. He gives birds the ability to fly. All the creatures get these unique skills — and he forgets the human beings. He’s a little absent-minded. Then Prometheus comes over and says, how did it go? Epimetheus says, yeah, mostly fine. And Prometheus looks around, seeing all these creatures with real abilities, and he looks at the humans and says, what about these guys? Epimetheus: I forgot.

ADAM: So we’re running around — and this goes to our open-endedness again — we don’t get one of those innate skills. Prometheus then takes the fire. But it’s actually two things: the fire and technê — technology. He goes and gives us that knowledge because we don’t have an innate skill. And so instead of an innate characteristic, we have this ability to externalize ourselves through technology. This isn’t something that we have separate from who we are. This is part of who we are. Regardless of where we are in history, whatever part of the world, whatever time — past or future — if there are human beings there, they’re going to be technological beings, because we don’t have any other means to get around. The question of technology is part of the essence of who we are. Our essence is, in some sense, outside of us.

ADAM: Stiegler also says: this is why we have to tell each other stories. This is why we have to create schools and universities and cultures where we transmit knowledge — orally or through the written word — to the next generation every time again. We need some means of transmitting that across generations in that technology. The reason this comes up, and I find this a compelling story about the human being, is that this is part of who we are. There’s no back-to-the-land, let’s-abandon-technology option. It’s in some sense part of what being human is. So the question of practice in relation to the technê, in relation to technology, is where philosophy comes in, where thinking comes in, where transformation comes in. We can transform ourselves through practice and culture, and we can transform ourselves through technology. And ideally, we have some kind of wisdom guiding the practice, guiding the transformation, that can feed into how we build the technologies. But the technologies are always going to be there.

10 theses on attention

Ashley Zhang — Sat, 14 Mar 2026 14:30:44 GMT

A few weeks ago, we hosted our second Art of Being Human event with philosopher on attention as an art form. You can watch the full video here:

If you’d like to receive writings, updates, and invites to future events, subscribe here.

Subscribe now

1. Attention is the practice beneath all philosophical practices. Nearly every contemplative tradition—Stoic, Platonic, Christian monastic—was built on attention as the foundational practice. Practices like fasting, physical training, and meditation exist to guard the stillness attention requires. Philosophy, in this sense, is a discipline of perception, not mere argumentation.

2. Distraction is a perennial human problem, not a modern pathology. Medieval monks complained that illuminated manuscripts were too decorative and distracting from the text. Socrates bemoaned writing as a technology that would erode our memory. But now, we accept writing as part of the intellectual life. The shape of the distraction changes, but the underlying dynamic doesn’t.

3. Human memory is an ecosystem that must be tended to. Human memory doesn’t store files like a computer, but reorganizes perception at the level of physical sensation. What you attend to shapes your memory, and how reality shows up to you. Writer offers a metaphor: your memory is like a garden that you must fertilize with rich material; overconsumption of flat, repetitive media leads to the “desertification” of the inner life.

4. Open-endedness is simultaneously our greatest vulnerability and our greatest strength. A foal can gallop hours after birth, but humans arrive almost unformed. This open-endedness is what makes us susceptible to algorithmic capture and propaganda, but is also what makes self-cultivation through practice possible. We are constitutively shapeable in both directions.

5. Attention and contemplation go hand in hand. Attention is focused, directed, and one-pointed. Contemplation is the opposite: letting go of the seeking, sitting in receptivity, waiting. Often, insights don’t come from willful concentration but from the space that opens when you sit in the silence, darkness, and emptiness without a goal. (For more, read Simone Weil and ’s Effort and Grace)

6. Technology is part of the essence of who we are. In Plato’s Protagoras, Socrates tells this story of Epimetheus and Prometheus. Epimetheus is given the task of assigning essential characteristics to all of the creatures on Earth, but forgets the humans. Then, Prometheus brings humans fire and the technology to create it. So instead of an innate characteristic, humans have the ability to externalize ourselves through technology. Our essence is, in some sense, outside of us. (For more, read Bernard Stiegler.)

7. Individual practice is insufficient for civilizational-scale problems. The cohesive, liturgically structured communities of 14th-century Catholic England provided synchronized collective practice: a shared calendar that organized time qualitatively. We’ve traded that collectivity for individual freedom. But atomized inward practice can’t adequately respond to how technologies are shaping society. It demands a new form of collective institutions, built bottom-up and ecologically, not top-down and monolithically.

8. Values and attention form a virtuous cycle. Aristotle argued that ethics is a habit: you practice by imitating virtuous people, and that starts to transform who you are as a person. Practice, therefore, refines our capacity for moral perception. The more carefully you attend to something, the more it reveals itself—which in turn shapes what you care about.

9. Spaces, both physical and digital, can be designed to afford deeper modes of connection and attention. The suffix of the word “contemplation,” templation, is the same root in the word “temple.” It marks out a space, a clearing, for a certain practice. In architecture and design—including the design of technology and ways of engaging with media—there's opportunity to re-envision new forms of collective practice and exercise, drawing on inspiration from the past.

10. Love is central to contemplation and attention. The deepest traditions of contemplative practice are not mind-centered but cardiocentric (heart-centered). The 14th-century Cloud of Unknowing frames contemplation as “sitting in the love of God” because love is what transcends the limits of perspectival knowledge. There are transcendental questions that knowledge ultimately can’t attain, but loving devotion can help us access.

Our next Art of Being Human event will be on March 25, on cultivating audacity with Courtney Hohne. Courtney is the founder of un/owned, a new kind of innovation lab tackling “ownerless problems,” and spent 10+ years building Google X, the world’s first moonshot factory, as Chief Storyteller. RSVP here.

As always, we welcome your thoughts!

Introducing Imbue's Substack

Ashley Zhang — Wed, 11 Mar 2026 14:30:52 GMT

Kanjun, Ashley, Glenn, and Matt wrestling with ideas at our recent offsite.

The AI scene is changing at a head-spinning rate. What once took months and twenty engineers can now be built in a weekend by one. A year ago, our company was organized around a single product; now, we’re working on a dozen projects in parallel with a team of the same size. And this is just the beginning.

The speed can feel exhilarating, but ultimately, it’s not the technological capabilities that matter. What matters is what these technologies are built for, and what future they engender. Is it one in which every person can use these tools to pursue what matters to each of us and develop our human capacities? Or is it one in which the technologies embedded in our lives become increasingly exploitative, and our ability to shape them constrained?

The decisions being made now—about ownership, openness, and who these systems answer to—will define our technological future. We believe whichever way it goes depends on whether open agents win over closed platforms. This means more control over your data, and a greater ability to create and modify the software and agents you rely on so they serve your intentions and interests, not the company that built them. That’s the future we’re building toward.

Of course, we’re not alone in these efforts. We’re inspired by conversations we’ve shared with builders like Geoffrey Litt and Joel Lehman; and Tim Wu’s writing; collectives like Resonant Computing and ; philosophers like and Shannon Vallor; and computing pioneers like Alan Kay and Tim Berners-Lee. All of them advocate for conditions that foster greater human freedom, creativity, and virtue.

This newsletter is a space to think together about the kind of future we want, and how to design the technologies and systems that help us get there. We hope this can be a way to stay anchored to the fundamental questions in a rapidly evolving AI landscape: What do human liberty and flourishing require, and how can we build technologies to serve them?

We’ll be sharing, ~weekly:

essays on how software, data, and AI can be built to serve the human good, informed by what we’re building here at Imbue
conversations with people building and thinking at the frontier
reflections on (and invites to!) the events we host in our SF office
the best things we’re reading that inform our thinking

If these questions interest you too, we’d love for you to join the conversation. What’s on your mind around our technological future? What worries you, and what brings you hope?

Upcoming events at Imbue

AI Philosophy Nights: The Post-Productive Human (sponsored by Imbue): A conversation with philosopher of technology Tony Kashani on the transformation of value creation and its implications on power, agency and meaning.
- Friday, March 13, 6:30-9:30 PM
Cultivating audacity with Courtney Hohne (The Art of Being Human): A conversation with former Google Moonshot Factory Chief Storyteller Courtney Hohne on cultivating the imagination and courage to tackle the most urgent, ambitious problems we face today
- Wednesday, March 25, 5:30-8:30 PM

A more radical Imbue

Josh Albrecht — Fri, 30 Jan 2026 23:34:00 GMT

At the start of 2026, I gathered the Imbue team and spoke about how both our CEO Kanjun and I became disillusioned with Imbue—and how we found our way back, more excited than ever about who we are and what we’re building.

I want to share that story with you.

Let me set the stage: It’s early 2025, and the Singularity is in full swing. It seems like every day a new model comes out, or a new product, that totally changes the landscape. Ideas from even a few months ago seem antiquated or obviated, and we can all feel that pressure bearing down on us.

Will we ever ship a product? Will anyone use that product? Will anyone like the product? Are we, as a research company, capable of making something that has users?

These questions worried all of us, and Kanjun and I were no exception.

So we did what many people naturally do when stressed: we entered fight mode. We doubled down on decisions. We tried to be “efficient,” to “focus” people on the “right” things, to “optimize” our “product development process” so that we could “win.”

We sprinted—we all sprinted—toward shipping Sculptor. We put in a lot of late nights. We worked hard. We added lots of features (and bugs), and, ultimately, we did ship something.

And not only did we ship something, we actually got more attention than we expected. People downloaded the product and tried it, despite its complexity and bugs. And many people even continued using the product week after week!

To be clear: this was a huge win! Most startups never ship anything. Of the startups that ship, the vast majority do not end up with daily active users. This is a 95th+ percentile outcome, especially for the first product we’ve shipped into the world.

But shipping Sculptor in early October was just the beginning. There was still so much more work to do: metrics to create, user interviews to conduct, bugs to fix, performance issues to address, regressions to add tests for, Discord message to respond to, Sentry alerts to silence. It was stressful.

So again, we doubled down. We had “stability” weeks, spent time bug fixing and refactoring,, all while trying to ship a backlog of new features just to keep up with the ever-changing landscape.

And we did make things better. Sculptor today is far better than the Sculptor we shipped in October. We have dedicated users, even in spite of those bugs. I’m really proud of what we built, and excited for where it’s going.

But this story actually isn’t about the product. It’s about us, as a company.

In all of this stress and sprinting and focusing, we lost sight of the forest for the trees. We forgot why we were here. And we got a bit burned out.

But most importantly, we lost something along the way. Fun Fridays just weren’t as fun anymore. There was something missing from team dinners and lunch conversations. There was something that felt off about our meeting structure and our processes. Things just felt off.

During our holiday break, and leading up to it, it was hard to put our finger on exactly what was missing.

Was it that we didn’t know what the mission was? Was it that we needed to “transition to being a product company”? Was it just that the office felt empty when people were out and sick? Or was it simply a manifestation of the uncertainty about the future, given how much was changing in the outside world?

Funny enough, Kanjun and I both independently came to the same conclusion.

We realized that we’d lost our way. The Imbue we’d been building had lost something core and special and important about it. Imbue is about so much morethan just making a product and making money. We are not a normal startup, and we never have been.

We’d been falling into the default startup patterns and startup behaviors: teams of engineers with managers working toward feature roadmaps, processes for triaging customer support tickets, defining metrics and KPIs, even doing user interviews.

These are all fine things, but they’re not what matters.

When we saw things moving fast in the world, it made us stressed and more narrowly focused on what was right in front of us, on the familiar patterns. When new competitors launched, or existing competitors launched new features, we worried: do we have enough features? How do we compare?

But what we should have done is zoom out, take a step back, and remind ourselves: why are we actually here? What is the point?

We’d forgotten that Imbue is not just about what we’re building—it’s also about how we’re building. We want Imbue to be an example of a different way to be: as individuals, as a group, as a company.

The whole reason we started Imbue is because we want to increase human agency in the world, to give each person more ability to author their lives. And that begins with how we work here, in everything we do.

We believe that it is better to grow together, to be kind, to think good, and to have fun. To build a world with more humanity, more openness, more agency, more liberty, and more play.

We believe that this way is better. We believe that, over time, it will win.

Every day I see you all build cool things, whether they’re side projects or features or bug fixes. I believe that together we can redefine what it means to do meaningful work in the era of AI, both for each of ourselves, and as an example for the broader world.

I want to make Imbue radical again. And the way we’ve been working hasn’t delivered on this.

If we’re going to show the world that we can empower people, it starts with each one of us. We want to show that it’s not just possible to have everyone joyfully working and growing together and deciding what to work on themselves and being empowered; we want to show the world that this is the superior way of being.

Imbue is an experiment. It is a question: what happens if we give people agency and control and choice? And support them in working on the things that they are most passionate about, and in the way that they are most suited to work on those things? What does that world look like?

Imbue exists to answer that question—so we should all start living it.

It’s now possible to ship projects much more quickly than ever before. I experienced this first-hand over the break hacking on things, and I think we’ve all seen this in the world and broader landscape as well.

There’s a massive opportunity for us here. If we really lean in to using the tools we’ve built, like Sculptor, I think there’s a whole new way of engineering that can let us ship more, smaller open software projects, rather than focusing on one top-down product.

We want you to be able to propose projects that you would be the DRI for. These should be projects that you think would impact our mission of making tech serve humans, and to have a way for some of those projects to become real. We want project DRIs to be fully empowered to make (or delegate) every decision related to their project. We want all decisions, all responsibility to be owned by you.

We’re doing this because we believe smaller teams working on projects built and shared in the open will do a better job toward accomplishing our mission. We want to be building projects that are used by as many people as possible, and that help promote this vision of what we want to see in the world:

Empowering people
Catalyzing agency
Democratizing power
Promoting an ecosystem of open software, open agents, and open data

We want to build an ultra-high agency culture and organization, together.

We want to do this for lots of different reasons: just to learn if it is possible, and what it looks like, and because we think this is a more effective—and more fun—way of being in the world.

But the core reason we want to make Imbue ultra-high agency is because we cannot run it any other way. It’s just not authentically who we are as leaders, and we refuse to do that any more.

Fundamentally, what Kanjun and I both realized over the break is that we already know what kind of culture and company and products and projects we want to build and see in the world. No one is stopping any of us from being the company that we want to see. We can just do that today.

Today is a new year, and it’s a new dawn for Imbue, and for all of us.

What does that mean in practice? That’s for us to figure out, together.

If you’re interested in joining our team to create a company and world that stands for radical human agency, we’re hiring!

Empowering humans in the age of AI

Kanjun Qiu — Sun, 18 Jan 2026 06:04:18 GMT

We founded Imbue in 2021 to build an AGI future where humans remain at the helm, shaping powerful AI systems rather than being subordinated to them.

We believed that AI could multiply productivity and help everyone prosper. But as capabilities began to accelerate, a nagging worry set in: we felt forces dragging us all toward a future where powerful AI capabilities become concentrated in the hands of a few, giving them outsized control over what our chatbots say, what our AI agents do, and ultimately what institutions, societies, and lives we build. In this future, most people quietly become less free.

We’ve now come to understand something crucial: the core challenge of AI lies in managing how it shifts power.

AI makes software systems dramatically more powerful. That power flows, by default, to those who can build and own those systems. This concentrates power, which leads to exploitation and disempowerment. We’re starting to see it play out today: tech platforms use basic AI agents in the form of recommendation algorithms to hijack our attention, creating addictive experiences we simultaneously crave and resent. We have little control over how these algorithms operate — and they often don’t work in ways that benefit us.

At Imbue, we initially thought creating AI agents that helped people automate computer tasks would naturally distribute power. But over time, we began to see how this risked undermining people in a way similar to tech platforms: AI agent builders control the algorithms that make decisions about their users’ lives, and their incentives may not be aligned. Agents optimized for profit or engagement might nudge us toward buying sponsored products that advertisers want, gain access to trusted data because it’s lucrative to monetize, or manipulate our emotions to keep us engaged.

Instead of locking people into centrally-controlled agents, we had to rethink our approach: to genuinely put power back into people’s hands, we had to equip everyone to create, customize, and truly control their own AI tools.

Imbue’s mission is to empower humans in the age of AI by creating powerful computing tools controlled by individuals. We believe this requires a shift in the philosophy of building AI — not selling AI software designed to serve the interests of its creators, but instead helping us all make AI software that’s tailored to our own goals and values. For example, I’d love an agent that helps me track and participate in local ballot measures. Or a personalized feed that curates only the most important news to protect my attention from the latest flashy headline. Or an app that protects my grandmother from scam calls in Chinese.

Critically, this kind of software remains accountable to individuals and communities. Like the right to vote in a democracy, we believe the ability to create, modify, and control AI software and agents gives people a voice in this era of powerful AI — so that humans can be actors, not acted upon. In an era racing to make machines that replace humans, we want to reveal the builder in every human.

Today, we see glimpses of this possibility: AI coding tools seem tantalizingly close to letting anyone build software simply by describing it. But that initial spark of creativity burns out quickly when we try to really use these apps and we discover how flimsy, difficult-to-extend, and unmaintainable they are. We hit a ceiling because AI coding tools struggle to fix bugs without creating more of them, or to add new features without breaking old ones.

But humans know how to build complex software — we’ve spent decades developing best practices for architecting reliable systems, testing, and managing changes. Today’s LLM workflows rarely incorporate these practices. But what if they could? We’re trying to create a better way to build software with AI — one that embeds engineering best practices directly into AI-assisted software development, so more people can easily create robust, dependable software for themselves and others.

Our initial product is a coding agent environment that helps engineers write healthier code faster with LLMs by making it easy to encode best practices, identify and fix issues, and test and run LLM-generated code safely.

You can see more details and try it out here.

Ultimately, our goal isn’t just to help engineers, but to embed our collective knowledge of software craft into an open environment of coding agents that invites much broader participation. When we imbue engineering best practices into software creation tools, it becomes much easier for anyone to build sophisticated software, opening the door for many more people to participate in the AI future. Instead of waiting for companies to build for us, we’ll be able to make our own idiosyncratic tools for ourselves and our communities.

When we can create and control AI software and agents, power shifts. At the most basic level, being able to build our own interface to services lets us resist algorithms and platforms we currently cannot opt out of; I’d make my own Twitter feed optimized for thoughtful topics and friends, rather than one built to provoke me. When we can pivot to our own solutions, platforms are forced to better serve our interests to keep us engaged. The same dynamic applies when faced with other entities’ AI agents that try to influence us — for example, we can build our own filter agents that block spam or unwanted messages to safeguard our interests.

To help level the playing field, we also need laws and societal structures that let the agents we build be as powerful as those controlled by companies with lots of data (for example, by letting us get our own data out of corporate silos), and that protect us from other entities’ agents when they impinge on our freedoms (for example, by trying to addict us, warp what we believe to be true, or subtly shape our decision to buy something). This is the core of our policy work at Imbue: to safeguard individual rights in an increasingly automated world and uphold democratic principles against power concentration.

Technology’s highest purpose is not to replace human capability, but to amplify what is already inside us. We believe creative potential lies within every person, waiting to be unlocked. The world’s most meaningful software is still trapped in the human imagination, locked behind the barrier between what billions of people can imagine, and what they can actually create.

And if we free it, we can create something better than our current trajectory: a world more democratic, more open, more free. A world where we imbue machines with our will to shape our lives and institutions, where we collectively direct our agents toward solving what matters most to us.

Instead of AI replacing humans, we can use it to nurture what is best within each of us: the capacity for creation, connection, joy, beauty, awe.

This is the human future we can fight for. This is what AI ought to be for.

Malleable software and human agency

Imbue — Fri, 14 Nov 2025 23:24:09 GMT

Geoffrey Litt is a design engineer at Notion working on malleable software: computing environments where anyone can adapt their software to meet their needs and their lives. Before joining Notion, he was a researcher at the independent lab, Ink & Switch, where he explored the future of computing. He did his PhD at MIT on programming interfaces. Most of his work circles around a very simple but powerful question: how can everyday people shape the software they use like clay so that humans can have more power and agency in the world?

In this conversation, Geoffrey and Kanjun discuss:

Barriers to malleable software
Inventing new UI components for the AI age
Principles for agent-human collaboration
How AI affects the creative process

…and more!

Timestamps

05:59 Barriers to software malleability: technical, economic, and infrastructural

08:57 Real-time collaboration and version control

15:01 Common Source: between open and closed source

20:54 Navigating divergence in software development

34:04 Data structure and universal formats

39:10 Local developers and collaborative software

42:57 Learning curves and tailorability in end user programming

50:55 How AI shapes creative work

52:07 Making agent-human collaboration like human-human collaboration

01:03:44 Mental bandwidth and parallel agents

01:08:50 Exploring design spaces through generated options

01:11:11 Visualizing code quality and malleability

01:13:45 Review as part of the creative medium

01:17:59 Infrastructure needs for malleable personal software

01:30:47 Rekindling the vision of personal computing

Transcript

Kanjun Qiu (00:30)

Welcome back to Generally Intelligent, a podcast by Imbue on the economic, societal, political, and very human impacts of AI. Today, I’m joined by Geoffrey Litt. Geoffrey is a researcher working on malleable software, computing environments where anyone can adapt their software to meet their needs and their lives. Geoffrey’s just joined Notion, and recently he was a researcher at the independent lab, Ink & Switch, where he explored the future of computing.

He did his PhD at MIT on programming interfaces and most of his work circles around a very simple but powerful question, which is how can everyday people shape the software they use like clay so that humans can have more power and agency in the world? And that’s a lot of what we’ll be exploring in our conversation today.

Welcome, Geoffrey. It’s really good to have you here. So I’m really curious, we always start with, tell us a bit about how you developed your initial research interests. You you went to MIT, you did your PhD in human-computer interfaces. What sparked your interests? What happened and how did your thinking evolve over time?

Geoffrey Litt (01:30)

Thanks, it’s great to be here.

Way before I got into research, actually, I was just working on a startup shipping product. I worked at an edtech startup out of college. And that was where this all kind of started. We were a team in Boston shipping software to thousands of schools across the country. And every school is different, right? From time to time, we would try our best to make the one best report for data that works for every school, whether it’s a rural elementary school or an urban high school or whatever. And then we would get on these calls and some teacher or principal would be like, you know, actually, I don’t use your product. I just hit export to CSV and then I use Excel. And it was really sad for me as a designer. But then you look at what they did and it’s like, oh man, this is ugly, it’s buggy, but it does exactly what you wanted. And sometimes what they would change would be the tiniest thing. Like, I didn’t like that color, that color made our kids feel bad. Or like, this word in your product touches a political nerve. It could be tiny, tiny details, but having the people on the ground in the classrooms having the agency change that stuff was really interesting. This aspect of spreadsheets that just kind of captured me.

And so I started thinking, why doesn’t more software feel that way? Why is it that, you know, there are some things you can do in Excel, but so much of software feels like it’s decided thousands of miles away and you’re stuck with however it was decided, you know?

That sent me down a really, really deep rabbit hole trying to figure that out. And that’s how I got into this question.

Kanjun Qiu (03:19)

That’s really cool. Yeah, I’ve heard from someone, like, Excel is the first and maybe only successful end-user programming tool. And you have all of this interesting use of Excel, but that doesn’t work for everything else. When you were exploring this question, why isn’t all software that way? Where did that lead you? Why isn’t all software that way?

Geoffrey Litt (03:41)

Oh man, yeah. There’s a lot of reasons, some of them are technical. I think historically, a lot of people have tried to make programming easier and more accessible, but ultimately there have been these kind of barriers of needing to think in really abstract ways that are not natural to most people. So that’s been one chunk of the challenge. And I think AI is changing that state of play a lot, and we can talk about that.

But there’s also lot of other barriers that are bigger and in some ways harder to tackle. There’s economic barriers, like, you know, how do people get paid to make stuff? There’s kind of like infrastructural ones, like a lot of our computing environment and ecosystem has kind of calcified around the assumption that people aren’t editing their software.

If you think about it,when I send you an Excel spreadsheet and you open it, you’re opening it in the editor for the spreadsheet. It’s not just a spreadsheet viewer, right? You actually have the editor and you can do whatever you want to it because it’s a file that you control.

And when we look at a lot of how software is shipped through app stores, the assumption is that, no, what are you talking about? The user would never edit the code. In fact, we do a lot of things so that they can’t edit the code. And so I think there’s a lot of factors that interlock around this core assumption. And that’s part of what I think makes this problem challenging to make progress on, is that you have to address a lot of these together.

Kanjun Qiu (05:03)

That’s a really interesting observation that Excel is the editor and most software isn’t. You open the software, it’s view only, it’s not the editor. You maybe can edit the data, but not the UI elements.

Geoffrey Litt (05:15)

Yeah, I think that’s one of the biggest principles around malleability is we want to be removing friction and barriers between being a quote-unquote user who’s passively using something and getting deeper and deeper into actively modding it. A really important point is that I don’t think that everyone should be modding software all the time. I’m a nerd, and I don’t want to mod most of my software most of the time, right? But it’s just about having the ability to go there if you want.

That’s where, you know, in an environment like Excel or spreadsheets, having the editor at least available to you always is a key principle.

Kanjun Qiu (05:53)

Right. Because sometimes you only want to view the Excel spreadsheet and not change the super complex financial model.

Geoffrey Litt (05:59)

Exactly. And there might be cells that say, don’t touch this unless you’re really sure you know what you’re doing. I think that’s something that people often miss too, is sometimes having more explicit guardrails can actually free people up to feel safer and more creative editing stuff. If we go back in the history of malleable environments, HyperCard is a system from, I think it started in the 80s and shipped on Macs. And basically, it was kind of like a precursor to PowerPoint in a way. You could kind of make these kind of like slideshows with these index cards basically. But what was really neat about HyperCard is that you could start out by just drawing pictures or writing text and they had these different levels or modes and level one or level two I think was like just editing text and drawing stuff. You weren’t even able to code in that level. And then when you wanted to, you could go to level five, let’s say, which was the deepest level where you can do anything, but you’re only going there if you know you’re ready and you know you want to.

And I think in a lot of spreadsheets, you actually have folk practices around this stuff, like you just mentioned where maybe you’re walling off part of it that’s dangerous to touch. I think sometimes paradoxically, boundaries like that can create freedom for people.

Kanjun Qiu (07:15)

That’s super interesting. Diving into that a little bit and as part of the infrastructural barriers you were talking about, like our computing ecosystem, maybe the infrastructure we have, the UI elements that we have, they are all calcified around this assumption that people are not editing their software. What kind of infrastructure, constraints, different UI components, guardrails, et cetera, do you think could… let’s say we rewound back to the 80s or, you know, we’re here today and we ended up calcifying around a different ecosystem. What elements of that ecosystem might exist such that you end up getting malleable software?

Geoffrey Litt (07:57)

Yeah, so maybe it’s best to talk about this concretely. I can tell you about some experiments we’ve done at Ink & Switch where I used to work and do research and some of the environments that we developed there that we used heavily internally to do our own work that kind of enabled the sorts of malleability we were seeking. One system that we developed really deeply was a system called Patchwork. And the core idea of Patchwork was basically, it starts out as just a document editor, like the flagship feature is just a markdown editor that’s collaborative. But then you can go deeper and deeper into modifying it. And what it ends up being is actually an environment where you can make your own tools and share them with people and edit whatever tools you’re using on the fly. It kind of achieves some of the goals we wanted.

So how do we get that? Well, a few things. You need the ability to live edit your tools as you use them. This is a really important thing we realized is most of the time when I realize I want to change something, I don’t have an hour to go do it. I might have five minutes though. And so we found that there’s this kind of magical combination of, use AI to do the coding, so that solves how do you get the new code. But then you also have this question of like, how do you ship that to yourself and to your colleagues? And the starting point there is really treat the code of your app just like the documents you’re editing together. What I mean by that is like when we open a Google Doc together, we’re just editing it live, right? Just make your software like that.

This is not how people typically think. Typically you think, you have to push to GitHub and there’s some CI pipeline that runs and it deploys and it’s like this industrial process that’s arranged around kind of like preventing screw ups and shipping to millions and millions of people. But no, just make it like Google Docs, okay? Once you do that, you instantly run into a bunch of other problems. So one is, if we’re using a piece of software and you’re editing it live, it’s not going to be fun for me because you’re going to be breaking it all the time, right? So then we realized, actually, this is why programmers have Git. We need Git for normal people. So we invested a lot in Patchwork and ideas around version, we call it universal version control. The idea is systems that achieve the goals of programming version control, but for any kind of data and for any kind of user, even someone who’s not super, super nerdy. Basically, what we found was that when you combine code as just documents that you’re sharing with people and you can edit them, you have AI helping you, and you have powerful version control that lets you create copies of things and merge things back together in good ways, then you start getting a really interesting set of ingredients where you can start remixing and mashing up and feeling more playful with your tools and your software. So I think that’s one starting point.

Kanjun Qiu (10:47)

That’s super interesting. Treating code as a live shared document, like text file, between two people, and then having a way to version that text and data, I imagine, as kind of the main things that you’re versioning.

Geoffrey Litt (11:03)

Those are two things.

Kanjun Qiu (11:05)

What else do you need to version?

Geoffrey Litt (11:19)

Patchwork has whiteboards. Patchwork has spreadsheets. And in fact, because you can add new tools to the system too, actually you end up with kind of the ability to store like the system needs the ability to store and share arbitrary data. This is another thing I’ll get into, which is when we think about, OK, what are the barriers to shipping software? A lot of the barriers are that the main ways that we deploy software for people to use assume industrial scale. So you need a back end. You need a database. You need load balancing, blah, blah, blah, nonsense, Kubernetes stuff.

There’s a lot of stuff and the gap between I have a working prototype that I can run on my computer and I can send you a link and we can collaborate in my new piece of software I made tends to be a lot of work. And so one of our goals in Patchwork was how much of that infrastructure could be offload to the operating system or the environment, so to speak, so that if you have an idea and you vibe-code a UI, how can you then share that with me and we’re instantly working together in that tool you just made?

And there’s a lot of layers to kind of figuring out data persistence and sync and all that stuff to make that a reality. I think you’re seeing this to some extent in a lot of platform-as-a-service startups out there are trying to figure out how do we become the best backend for vibe-coded apps in a sense. I think that’s part of it, but I think we can push even further than most startups are going.

Kanjun Qiu (12:40)

Yeah, I think one of the really interesting things here. So Imbue recently shipped Sculptor, which is a tool for you to run parallel agents to write code. And one thing that we’ve been thinking about is sync and collaboration, real-time collaboration. And something that we made is this thing called Pairing Mode. So all of the agents run in containers, which means they don’t have your code. They’re not running locally. They run in containers because you don’t want them to delete your files accidentally or things like that.

I actually really resonate with what you said about versioning. We’ve had to think about the Git workflow of the developer and the agent as two separate things. And how does the agent’s version and the developer’s version mesh together? We’re just using Git right now, but Pairing Mode basically copies the agent’s files and rsyncs it to your local environment. And then now you’re real-time editing with the agent. What you said is really interesting, because we’ve been thinking about, like, I’m real-time editing with the agent, but actually sometimes I want to real-time edit with someone else. In the normal existing today’s software engineering industrial process, nobody wants to real time edit with anyone else. That’s actually really rare. So it’s been an open question for us. I resonate with you a lot on things built for scale, Kubernetes, actually wouldn’t it be better if everything were local and just running on the compute that you have in your hands and you don’t have to handle all these scale problems.

I’m really curious, when you were working on Patchwork together, when did you want to collaborate real-time when coding versus when you want to do this more like industrial, independent two people merging branches into the same main workflow? Did you ever want to do one versus the other? Or did you always want to live edit?

Geoffrey Litt (14:33)

Yeah, I’m really glad you asked. I’m fascinated by this area. I have this opinion that collaboration between humans and AI is essentially a version control problem. What I mean by that is when you think about the problems that a version control system like Git is meant to solve, you have a bunch of people working together. They might be working concurrently on different stuff. And you need ways to go off and try stuff and be experimental. You need ways to review work that other people are coming to you with and talk about, I want to do this, what do you think? Let’s go back and forth and discuss. And then you want to track, okay, we decided it’s good, let’s do it. And you want to see that in your history. And when you think about working with AI actually, and you look at the needs, a lot of these things map really directly. So I have an unreliable alien intelligence out there doing stuff for me. How do I know if I like it? I need some way to review what it did. I need some way to talk with it and with other people about what it’s proposing. And then when we like it, you know, we can accept it.

Kanjun Qiu (15:35)

Like accept its changes and discard the changes that don’t matter.

Geoffrey Litt (15:38)

Exactly. And I think actually one of the underrated reasons that coding has taken off as a use case for AI actually is the prior existence of mature tooling, like pull requests, for doing this workflow. I think in a lot of other domains, if you don’t have this stuff built up yet, you can’t just let an AI agent go do stuff to a really important shared workspace without any ability to see what it did, talk with it about what it’s proposing.

There are ceilings on what you can do there, right? And I think the more version control you have the more you can just kind of let the agent go do stuff. And so I think it’s a fascinating area. Now to get to your question, I would say when working on Patchwork, we mostly weren’t live editing together and coding. We were probably mostly actually working async. But definitely we were leaning heavily on branches. And I think, you know, what you were talking about reminded me of, I think a lot of products are struggling right now to reconcile the old way of Git with the new requirements. Parallel agents, more real time stuff. And I think it’s going to be interesting to see what does it look like. Do we reinvent version control from scratch for the new requirements? Do we layer on top of Git as a lot of products are doing?

Kanjun Qiu (16:59)

One thing that I’ve been sitting with is on the idea of version control. This may be not obvious from our website, but at Imbue, we really care about making software modifiable by the end user, because we think that basically it’s a question of control as we go into this AI future. Like today, we’re kind of controlled by our software, actually. Our attention is controlled. Our actions are controlled. We’re controlled by other people who are building these systems. Sometimes inadvertently, they’re trying their best. Sometimes very explicitly, they’re trying to maximize profit or engagement. AI makes this problem worse. But it also gives us opportunity because AI can write code. So a question I’ve been sitting with to your point—you mostly weren’t live editing with Patchwork, you are mostly working async, but you also want to be able to change things. And maybe sometimes those changes make it back in, and maybe sometimes they’re just for your local system. I think it’s really rare, systems like this, except for open source projects, not many systems like this exist. In your lived experience, why did you build live editing if you mostly weren’t using it? What was it? I feel like there’s something interesting in live editing and I don’t fully understand what it is and I’m really curious for your thoughts.

Geoffrey Litt (18:30)

Oh man, I think there’s like three separate topics to unpack there. I’ll start with the last one. So why live editing? I think it’s just what people expect. In some sense, it’s the most straightforward model. We get on a link, we’re looking at the same thing. Every kid expects that now in all of their software. They don’t know what files are, they don’t know about emailing. It’s just, everything’s live. And I actually think that’s a really lovely starting point for remote collaboration. When we get on a whiteboard, we can just draw. It feels really fluid and nice, you know? My view, and I think what we explored largely at Ink & Switch is like, it’s a yes and where you want that and you want the ability to go off in a corner and think about something privately without having your manager come in and stare at you, right? We call this creative privacy. I did a bunch of user interviews with writers talking about they feel observed in Google Docs basically, right? And so I think that’s the simple answer is that live editing is how the world works now. And so we got to meet people where they are.

I want to get back to something else you said, though, which is about this question of values and what software is trying to do to us, essentially. And I think that is a deeper undercurrent of malleability that we haven’t really addressed yet.

Cory Doctorow has this phrase, adversarial interoperability, which I love. He talks about things like ad blockers that are browser extensions, right? What’s happening there is that there’s this adversarial relationship where a website’s trying to push ads on you and you’re pushing back and using this technological capability to basically set up an environment that’s more in keeping with the way you want it to be or your own values. I think ideas like Bluesky algorithms being less centralized are also in this vein. And I think that is a very important part of the equation to consider when we think about barriers.

There are incentives that big corporations have to not let us change stuff because that’s how their business works. One analogy that I sometimes like to use is it’s more of a food court than a kitchen. There are these big companies that have their own agendas pushing a menu of choices at you. And in your kitchen, you have a lot more control over what am I trying to do with my food? What cuisine style, what health criteria am I trying to meet? And you have more of an ability to mold it to be in keeping with your values. So I think of the software app stores as kind of these food courts. I think that’s another big piece we have to solve.

Kanjun Qiu (21:09)

I agree. Yeah, it’s really resonant because Glenn on our team, he’s a prototype engineer, and he wrote about how it feels like we are in a world of vending machines right now. We get all these vended products, but in a truly open kitchen, we can change the kitchen layout itself and cook the food that we want. Earlier you talked about three barriers: technical, economic, infrastructure. We started out talking about infrastructure, but the economic barriers are ones that we think about a lot.

I’m curious at Ink & Switch and I’m happy to talk more about how we think about the economic barriers, but at Inc and Switch, did you think at all about the economic barriers or like kind of what’s your perspective on that?

Geoffrey Litt (21:53)

Frankly, we mostly didn’t yet, I would say. We were focused on how can we make a really awesome, malleable system that we wanted to work in. I think in some ways, the economic barriers are some of the hardest ones to work on in a research context, because I think ultimately companies with commercial incentives have to solve the business model piece. And I think my view of the world is that the technical and infrastructural barriers are big enough that they still really matter and researchers can make progress on that piece somewhat separately. I don’t know, I think the thing that comes to mind for me is I once did a deep dive into this system called OpenDoc, which Apple had in the early mid-90s, which is a cousin of a related system on Microsoft called OLE. And the idea was it was very malleable software-esque.

You could have these mix and match widgets in your documents instead of monolithic applications. And you could kind of buy these smaller widgets from companies and combine them with your existing software. And apparently one of the challenges they hit was, when something breaks, who do you call? A really nice thing about applications is there’s a box on your screen. If something’s wrong with that box and you have an enterprise support contract, call them and they’re on the hook. And the more you break things down into small units, you know, there’s basic questions of like, are you willing to pay for a tiny, tiny feature on its own and have a separate procurement for that? But also, who’s on the hook for integration work, you know? A lot of users value things working and paying for that. So I think those are some of the big challenges.

Kanjun Qiu (23:41)

Mm-hmm. Yeah, one of the things that we’ve been thinking about is, what LLMs do is they make code easy to write and replicate. In theory, in theory. At some point they will. And so to your question of like, you know, how do we get malleability but also software that people support? I think there’s actually some interesting space between closed source software that people pay for and open source software that is fully volunteer supported. Because kind of the point behind malleable software, one of the requirements is that you need to be able to modify the source code, probably. And so in that sense, malleable software has to be open source by default, or source available by default. But today’s open source environment is like free. There’s free software. And so like who’s going to support it? It’s like a team of really overworked developers, and they’re like maintainers of this project, and they’re all volunteers, and that sucks.

And so I think we’ve been playing with this idea we’re calling Common Source, which is between open source and closed source, and this idea that actually probably most of the important software we run should be run from a public commons of code, of common source code. And in Common Source, what we’re toying with is this idea of a license that actually you can get the source code, but you still have to pay the creator or the group of people who are creating the code. And so then that starts to answer some of these questions potentially of like, OK, well, you’re paying for maintenance, really. You may be paying a SaaS fee for maintenance and getting the things you want. Stuff breaks. You can stop paying them. So the incentives are aligned in this way. But at the same time, you still get the source code, so you can make your own changes. And if you diverge too far from the original project, well then maybe they can’t help you anymore. I think we have to make some changes to our assumptions around open source and the philosophy behind open source to get to valuable software.

Geoffrey Litt (25:51)

That’s a really fascinating idea. I love that. I totally agree that open source seems to be a natural prereq and that it raises these questions. I think it’s tricky because I love that perspective you bring. At the same time, I think the history of open source business models has been fraught with a lot of failures, and when we think about, okay, code is now much easier to copy. I mean, probably if you have the code, you can easily make a copy that is legally different, so I don’t know, it seems tricky.

I also think to your last point around, you know, divergence, I think this is a huge, huge challenge to figure out. If there is an ongoing software project that is shipping updates and I have my own version of it where I did my own thing, I mean, software developers know this as sort of like the fork maintenance problem, and it can be a huge pain in the ass depending on what you’re doing.

There are companies that maintain forks that have teams of engineers just keeping up with what’s happening upstream, so to speak. And I think this is something that I’ve thought a lot about in malleability. I think the root problem is if you treat diversions as kind of arbitrarily editing the code in any way, the problem of fork maintenance is really hard. Whereas there are other ways to factor it out. Like if you have plug-in APIs, you can say, okay, anyone can make a plugin, and we’re gonna try to keep this plugin boundary stable is one way to do things. Now, there are trade-offs, like often there are things you can’t do through the plugin API, so you wanna dig deeper. I think it does get tricky, but new ways of organizing software to be modular and compositional in different ways can lead to different abilities of people to mod it.

Something I’m really curious about is, if we progress towards a world where you have a lot of AI coding happening and you have people wanting to maintain forks with heavy divergence, maybe we just start structuring our code bases differently to treat that as the number one goal, basically. There have been some wacky research systems that have thought about programming with this as the number one goal and they get to very different structures than we’re used to. There’s a great idea called behavioral programming from David Harrell where basically his idea is what if a program is like a rule book, just a list of rules. And you just add rules and rules can cause exceptions to previous rules. So I might say, the red square can always move to that square, but then you could come along and add a rule that’s like unless that thing says five. And then you see how like we just keep adding and adding and adding to this ball more and more rules. And we never have to reach into existing rules and modify them. Maybe there are ideas like that that could change the game.

Kanjun Qiu (28:47)

That’s really interesting, so append only as the solution to divergence. So you actually don’t diverge. Yeah, that’s interesting. What other ideas are there around divergence?

Geoffrey Litt (28:57)

Another inspiration, there’s a common pattern in software of middlewares where you basically stack up these layers and you can always add more. I think maybe it’s the same principle in the end of just like the more that you can have additive modification without reaching in to touch the existing stuff, the better. I also frankly would just throw the AI hammer at it and say to some extent when you reach in and intrusively modify something, it’s gonna get messy, but probably 80% of the fixes that happen in fork maintenance are routine and don’t require anyone to think that much. They’re just icky. And so I’m very optimistic that AI will get to the point where it can mostly automate the easy stuff and then at least only leave the tricky hard stuff.

Kanjun Qiu (29:44)

Mm-hmm. Yeah, that’s really interesting. Principally, when it comes to handling divergence, we really have two options, I think. Okay, conjecture, made this up on the spot. So principally we have two options. One is maintain the internals of the system and then add more stuff such that the end behavior changes, so rules create exceptions to previous rules and the internals don’t change. Or like middleware, or more layers of abstraction also kind of does the same thing. You don’t really change the underlying stuff, but you’re adding more abstractions on top. Now you can do different things and more things. Plugins are another similar thing. Like you’re not changing the center, but you’ve got this API and you can add stuff. So, one paradigm is like don’t change the middle to deal with divergence, actually just have modular pieces on top. And then the second paradigm is actually, change the middle, and then use AI to solve it somehow. Something that we do a lot internally with Sculptor is we do test-driven development, where we write a bunch of tests. And it’s not magical like this yet, because when we try to write the test, the tests actually don’t capture the full behavior of the system. But in theory, you would have tests that capture much of the behavior of the system, kind of do a full refactor rewrite of the middle and then have it abide by some rules. And then that does let you modify the center.

Geoffrey Litt (31:17)

I like those options. I’ll throw in one more complication maybe, which is that once you’re talking about collaborating on shared software, think things get more essentially complicated. Single player software, whatever, I can have my own weird version and you don’t care. As long as I have a smart enough AI to keep up with updates, it’s not your problem. But now imagine we have team software.

So now it’s fundamentally a different problem where there are compromises that have to be made. People have to have shared practices around working. I’m fascinated by the question of how far can different people’s tooling setups diverge while still retaining the ability to collaborate and what kind of layering promotes that. Concrete example, many software engineers have a preferred development environment. And when you join a software team, you get to bring your favorite editor, typically.

And that works because code is stored in this very universal plain text file format. There’s a universal version control layer that most people use Git or whatever. You pick your system. And that’s just a file-based thing. So then whatever tools you want to use to edit your files, whether it’s Sculptor or like Vim or whatever, not my problem, right? And so there’s this really nice distraction boundary and we can still work together. That’s first of all, is not the case for most SaaS software.

There’s a deep, deep coupling between the data that you’re sharing with your teammates and the one editor that is allowed to edit that data. And secondly, I think it’s often tricky to even tell like, where could we draw that boundary? Could you use like Asana and I use Trello? Would that work? Could we sync them? I don’t know. There’s probably stuff that doesn’t fit, right? At Ink & Switch, we did this project called Cambria where we kind of took on this challenge at the data layers. We thought about if you were synchronizing data across really different apps, which want to store their data in different shapes, could you make some sort of glue that shuffles the data back and forth live as people collaborate? So you’re always kind of seeing as much as possible on both sides, even if it’s not 100%. I think there’s a lot to consider there.

Kanjun Qiu (33:36)

That’s super interesting. This is super interesting because on Sculptor, we’ve been thinking about apps as being separate from data. Like code is not data. And actually data has to be treated fundamentally differently. And with Cambria, you’re kind of like synchronizing data across these really different apps. And one thing I’m curious about, this question of like, can I use Asana and you use Trello is what is universal about data? Is a Postgres database that is structured with infinite columns somewhat universal? Is it documents that are universal with plain text? In all of your experimentation, what have you learned about data?

Geoffrey Litt (34:29)

This is a fantastic and very difficult question. What is the elemental material that if we just stored everything in X shape, then everything would work? I don’t think there’s silver bullet, unfortunately. I do think, though, the essential quality to think about is how structured and specific the data representation is. The idea of files generally is a pretty low level abstraction. It’s really just a sequence of bytes. That’s all you know. But you can layer ideas on top of that, like file formats, which have their own constraints, right? You can store like, for example, JSON as a file, which then adds more constraints, but JSON is also pretty general. And then you could say, I have this JSON schema, which says like only JSON of this shape. And I think you can have this progressive more and more specific layers and you can’t get everyone to agree on really specific schemas. It’s never gonna happen. At the same time, really, really low level abstractions, like, it’s just a sequence of bytes, good luck, are very open-ended and I think allow people to do too much different stuff that make it hard to work together. So I think we’re aiming for something in the middle. The Ink & Switch systems all run on this system called AutoMerge, which is a library for synchronizing JSON documents. That was like, JSON is the universal shape. There are different options. I think a Postgres database is perfectly reasonable other option. But I think that’s roughly the challenge.

Kanjun Qiu (36:04)

Mm-hmm. That’s really interesting. I have a bunch of thoughts here. One, it depends on the structure of your data a little bit. A Postgres database is good for data that is stored with identifiers and attributes of those identifiers. And JSON blobs are good for a slightly different type of data. What you said makes me wonder, though, if everything we’re playing text and people could actually diverge quite a lot. We now have these universal data processors, which are LLMs.

And so can we turn that underlying file into really almost any abstraction along this spectrum of like, you know, from like sequence of bytes to like JSON key values to like JSON with a schema to database to something else.

Geoffrey Litt (36:56)

I’m super optimistic about that direction of thinking. Many of the data interrupt problems in the world are just like the same information being represented slightly differently. And for those, LLMS, slam dunk. That said, you know, there are also essential differences. Like, in Cambria, the example we gave is like, if one to-do list app can assign multiple people to a task, but another one can only assign one person, there’s nowhere to show it. It doesn’t work. And maybe then you don’t realize that I’m also working on your task. So I think there are these tricky, essential things to keep in mind when we’re working on shared information together with divergent tooling.

Kanjun Qiu (37:41)

One thing that’s interesting about that example is it really separates the concerns at the data layer versus the UI layer. In theory, you could just store whatever data you want. You can store this task having many, many people assigned to it. And then at the UI layer, or at the app layer, you do something, some kind of post-processing to figure out who you want it to be for your app.

Geoffrey Litt (38:12)

Yeah, but if the app can’t show multiple people, you still don’t, you know…

Kanjun Qiu (38:17)

You still can’t see the underlying data.

Geoffrey Litt (38:20)

Yeah, and I think maybe you show more of the underlying data or maybe, I don’t know, like an AI comes in and modifies the other to do app and makes it show multiple people because you just need that. And if the other person doesn’t mind you, roll with that. I think it’s the essence of it is like when we’re collaborating together, we actually have to make compromises about how we’re going to do stuff. And there’s always going to be like a collective element there that like software, software can’t let us be infinitely individualistic, you know, and I think this actually gets to a broader malleability theme. We talk about this in the essay we published this summer at Ink & Switch around malleable software. The goal is not that everyone develops the full skillset needed to do anything to their software. There’s a long history of people working together with others.

With spreadsheets, for example, there’s been some really nice ethnographic research by Bonnie Nardi, who’s kind of a legend in the end user programming research community, looking at, how do people use spreadsheets in offices? And it turns out usually there’s someone in the office who’s really good at Excel. And when you don’t know how to do a complicated formula, you go ask them, right? But you can still do a lot of stuff yourself. And maybe you pick up a bit on what that person did and watch them work and you level up gradually.

And crucially, that person doesn’t work at Microsoft. They are in your context. They can sit with you. They know your problems. And so they’re much, much closer to the site of use than the site of original platform production. They call this pattern local developers. I think this is a really, really, really important pattern to think about and build around for these kinds of systems. I mean, we see it at Notion. There’s often someone in a company who is really good at Notion and sets stuff up for people, right?

That layer always exists. And it’s not a bad thing that it exists. AI might be able to help fill that role sometimes for some people, but I think assuming that people are working together to create shared software environments should be the goal.

Kanjun Qiu (40:26)

Yeah, that’s really interesting. In software development, there’s the same thing. Our CTO, my co-founder, Josh, is the expert in a bunch of different ways and helps people figure out how to build on top of the system that we have. Regardless of how good LLMs are, you, you probably want some kind of like expert. There’s still, maybe what you’re saying is like, there’s still this idea of levels of expertise with a tool, even if it’s an end user program tool like Excel or programming or like Notion. There’s levels of expertise and someone who has a lot of that expertise can actually do a lot of the quote unquote programming and set up the system for other people in their context to modify. it’s not just like a, to our data point, it’s not just like blob of data and then everyone does their own entire full stack thing on top of this blob of data. It’s like, okay, blob of data and then like, different people take it and mold it into what’s useful for their own context.

Geoffrey Litt (41:29)

The mental model that I really like there is this idea of a smooth slope from user to creator. It’s not that deep modifications aren’t hard. It’s that whatever you want to do, you should have to do the least amount of work possible to do that thing. And you can get slowly pulled into deeper stuff if and only if you want to. And you stop where you want to. I think this is very distinct from like our existing computing ecosystem. You can basically like use the thing, tweak some settings.

And then, if it’s an open source project, guess you could download the entire code base, compile it for five hours, learn to code, and it’s this insurmountable cliff. No one’s doing that, right? It’s just some approximation. And so how do we smooth out that cliff is one way to think about it. And I’m curious for your thoughts on, I think AI can help pull people up that cliff if the system is designed correctly. I also think it might actively prevent people from going up the cliff if it’s arranged a certain way. And what I mean by that is, if I ask my coworker, who’s the Excel wizard to teach me formulas, and they sit with me for an hour and we do it together and I see them doing stuff and we talk about it, maybe that’s a learning moment for me. Whereas if I ask the whatever Excel formula wizard to do it and it spits out something in five seconds and that’s wrong, but I don’t notice or, even maybe even if it’s right, you know, if it does the thing for me, what did I learn?

I actually lost learning them, you know? I think a lot about how can we set things up to be closer to that former, but I’m curious how you think about that.

Kanjun Qiu (43:05)

I think this is really one of the key insights about end user programming, is that there is a skill curve. There’s kind of this learning curve. And you had the gentle slope to tailorability, is what you called it. And in LLMs, something that we think about is there are kind of two pieces to tailorability. One is how how much the user understands. And then the other is how tailorable the system is. And you can modify both. Modifying how much the user understands is about education in a lot of ways. It’s about how do we make it so that it’s easy for the user to understand how to go closer to what they’re trying to do.

A concrete example of us experimenting with this in Sculptor is that there’s a beta feature called Suggestions. And it’s still very early, but it basically looks at your code base and suggests fixes, improvements, and refactors, directions you can go based on what it looks like you’re trying to do. And in theory, the suggestions, they’re proactive. And so they’re kind of telling you things about your code base that you might not know about. And they’re telling you things that you might end up learning. So we’ve had some users who are like, I didn’t realize that I shouldn’t expose my API key in plain text. Cool. Didn’t know that was a security best practice. Or like, I didn’t realize that I had like five copies of this function that were slightly different from each other. And actually, there was this better way of doing things that’s like the default standard. So that kind of proactive teaching, I think, could be part of a system that is an environment, is like an end user programming environment.

The ambitious way I think about Sculptor is like, if we could make this into an end user programming environment, that would be awesome. On the system side, how do you make the system, outside of user education, how do you make the system actually more tailorable? I’m curious for your thoughts here, but I was thinking about interfaces and how some interfaces feel like they might be more amenable to tailorability than others.

For example, this might be a terrible example, and might not actually satisfy this requirement. I’m going to try it anyway. And I’m curious what you think about more tailorable interfaces. But for example, the other day I was using MailChimp, and I was trying to send a plain text email. And I could not figure out how to send a plain text email in MailChimp. This is extremely difficult. And I was like, man, it would be really nice if I had a retrieval UI where I could send some messages in chat and it finds the API endpoint that is the plaintext email function and then gives me a UI that is the plaintext email. That would be really nice. Then I could learn, OK, do you have a plaintext email endpoint first. And second, if you don’t, then maybe that would be an entry point for me to build one, something like that. So if an app is no navigation, none of these other dependencies, it’s like retrieval only, like only retrieve API endpoints that take actions. Like maybe that lets me like build more actions on top of the systems, the system. I don’t know. What do you think?

Geoffrey Litt (46:37)

I think you’re getting at a really big question, which is how are UIs going to evolve in this new age? I think we might have talked a bit on Twitter about this too, like navigation-free apps or whatever. Yeah. So let me get at your question indirectly. So I promise I’ll come back to it. So I think command lines are really interesting. We’ve left them behind for good reasons. GUIs are better in a lot of ways. But there was a really interesting quality that command lines had, which was that when you do stuff manually one time, it’s the same way you do it if you want to automate it or build on top of it. While you’re in the course of normal use, you’re kind of picking up this underlying structure that ends up being really useful if you ever want to build on top of the thing. Someone had this great phrase, like a CLI is like a mediocre GUI and a mediocre API both. And that’s what makes it great, which I think is really lovely. What you’re talking about, I think a big problem with GUIs is that they lack a lot of hooks and compositionality for building on top of and going further with. They kind of tend to actually not really expose you to what are the underlying things I can actually do in the system and how could I recompose those in different ways? And so I think that’s a big question and challenge for me is, can we retain the benefits of graphical interfaces, things like discoverability, things like data visualization, which I think is really underused in a lot of LLM interfaces showing this stuff. But can we also figure out how to make it obvious that you can go further than what this one GUI lets you do and let you in on the internal structure? One concrete starting point could be, in a lot of power user apps like Photoshop, when you do stuff, there’s an undo stack that shows you everything you’ve done in a list. So it’s reifying actions you’re taking as steps. And then that’s the building off point for macro recordings and kind of like automations. And I think I wonder, could we have more computing environments where as you do stuff, you see the things you did as things? And then you go from there.

Kanjun Qiu (49:02)

Mm-hmm. That’s really interesting. Building on that a little bit, something that we think a lot, we work on AI agents, agents take actions, and I think there’s a difference between displaying information and taking actions. And the description, actually what you described about a CLI as a mediocre GUI and a mediocre API is really interesting because CLI tools are primarily for taking actions. They’re not very good for displaying information.

GUIs are really good for displaying information and they can be good for discovering actions, maybe like taking action sometimes like if you’re trying to figure out what action to take then maybe you can kind of like play around with the information until you figure out what action to take. But the taking of the action in the GUI is not great. It’s not very composable. It’s like, you know, not very like automatable.

And so if we think about displaying information and taking actions as two separate things, then it makes me wonder, OK, your point about the undo stack is interesting because that’s a sequence of actions which could be turned into a CLI tool, in theory. And the question really is, OK, what’s the input into the CLI tool? Unfortunately, sometimes the input involves looking at a bunch of data and analyzing it and visualizing it in a GUI form or something like that.

But there’s some processing that goes into the input, but the action itself can be a CLI tool.

Geoffrey Litt (50:27)

Yeah, totally. I think, you know, now I wrote most I write most of my CLI commands by just telling an AI what I want to do. And then it writes some really long command that I don’t fully understand. I hit enter, right. Which I should pay more attention to. But I think you’re really getting us something. It usually works, right?

Kanjun Qiu (50:46)

Yeah, exactly.

Geoffrey Litt (50:55)

I think like we are all figuring out one interaction models make sense right now. And I think you’re getting at a couple of important things, which is that for commands and actions, think language is actually really good for saying what to do for the most part. And then for the return path from the agent, I think for some things, maybe two-way voice conversation feels good, but for a lot of things, having visual aids helps. So deploying the full field of graphic design and data vis to show things.

When you ask Siri for the weather and it shows you a weather card, this is a version of this loop. So think that’s a really powerful basic loop. And then the one thing I want beyond that for some use cases is a shared locus of attention, like a desk we can both point at and work on. So that might be as simple as telling the agent, edit this code in Sculptor. Conversely the agent saying like, did you notice that this line is weird? You you’re kind of sharing this this thing.

Kanjun Qiu (51:59)

Yeah, you have like a shared space you’re both looking at.

Geoffrey Litt (52:02)

And you can point at it. That combination, I think, ends up being pretty good.

Kanjun Qiu (52:07)

Hmm, interesting. An idea we’ve been toying with is that agent-human collaboration and human-human collaboration might not be such different things. Perhaps you can design for both of them. One of the principles in Sculptor, one of the design principles, is everything you can see, the agent should also be able to see. It should understand how it works. It should see your entire UI. If you tell it something and you’re referencing a part of the UI that’s not the chat, it should know what you’re talking about.

And same with like human-human collaboration. To your point earlier about like real time is just what people expect. Like I think maybe like two humans want to both be looking at the same surface and the same information. Otherwise it’s actually quite hard to communicate.

Geoffrey Litt (52:52)

I really like that principle you just brought up. I think to a large extent aiming for human-human collaboration as a gold standard for a lot of stuff is actually a great goal. I think there are other patterns that can make sense sometimes, but even just looking at human-human, like if you and I are sitting next to each other pair programming, there’s a lot going on. You know, very simple stuff, like you can point at things and see my screen and I know that you can see my screen and there’s not any weird question of what can you see. So there’s a lot of good theory of mind going on.

But also I think there’s much deeper stuff. Something I’ve been thinking about lately and I’m curious for your thoughts on is you can tell if I’m really busy and stressed because we have a launch tomorrow and I just want this fricking button to work. And I’m like, Hey, Kanjun, can you fix this button for me, please? You’re not going to launch until like an hour long lecture about like the philosophy of like how we think about buttons, right? You’re just going to like help me out because I’m in a bind. And you know, it might be a totally different situation. Like it might be my first day at a new job. And I’m like, man, like I’ve never used this programming language before. Can you like show me around a bit? And I’ve always felt like computers by defaulting to having so little context about us and our environments compared to human interactions are at a real disadvantage where they can’t sense these things. And so they rely on us to give them that context through our prompting in the AI era, but we’re not very good at giving them all the context that they need. And so we end up in these weird mismatches. Particularly along that dimension, I just brought up, how do you know how much to bring the person along and help them learn themselves versus just do it for them? When should they be brought along? When does it matter?

I don’t even know myself and as a programmer, I’m often very unsure how much I should be getting in the details of the thing. Even if the AI can do it perfectly, there’s some intangible benefit to me being in the details. When I’m UI prototyping, for example, I might have new different ideas from knowing how it works, for example. And so I don’t even trust myself to know how much I should be in the details.

How do we do this?

Kanjun Qiu (55:20)

That’s a really interesting question and direction. When people talk about AI slop, AI slop is this lack of taste in a way. What you’re pointing at that’s really interesting is the more you understand how something works, how your system works, how what you’re trying to build, the UI you’re trying to build works, the more taste you have for where it can go. there’s this taste comes from depth and a depth of understanding of me as the human that, and like, it’s so weird because like AI systems have no taste and yet they know everything. So it’s not about knowing the thing. It’s something about like preferential attention based on the details that we’re seeing that like serve what we’re trying to get at or something. I’m quite confused about this topic.

Geoffrey Litt (56:13)

I think, yeah, I think a lot of people have a very incorrect mental model of how creative work happens, which is something like there’s an idea in your head, you just got to somehow like get it into the world as it is in your head. And if you could just do that, then it’s done. And so in that model, like all you need to do quote unquote is like describe the idea perfectly. And then someone else, something else can just go to it. Right. That’s not how it works.

Kanjun Qiu (56:38)

That’s not how it works. Not at all.

Geoffrey Litt (56:43)

Creative work, open-ended work, like anyone who’s really deep in it knows that there’s this conversation happening between you and some medium that you work in where the idea is being shaped as you go. Working with the medium is changing your conception of what you want. There might even be like accidents that happen that are cool, know, spark new ideas. And, you know, to some extent, maybe some of it’s even muscle memory, right? So it’s possible, for example, that like a guitarist composing a new song might not know what chords they’re about to play. Their fingers just do something and then they hear it and they’re like, that’s cool. Right. So when you, when you start digging into that, I think it, it raises a lot of questions about the role of AI in that process. And I think a lot about this in my own creative practice, which is mostly professionally UI prototyping. Like, I use AI coding a lot. And I think, at its best, can really speed up feedback loops that weren’t essential for me to be in. And that lets me make progress faster in this exploration. At worst, it cuts off a whole process that I would have been in myself of creative exploration because it just, I say what I want and it makes one bad thing. And I’m like, oh man, that’s not good, but like you made the whole thing and now I just, can’t unsee it, you know?

Kanjun Qiu (58:12)

Do you really feel that? That’s really interesting.

Geoffrey Litt (58:18)

Yeah, totally. That’s happened to me. I’m like, oh, well, it’s done and it’s terrible. Like, whatever. And I think it’s a very sensitive. I’m an AI coding optimist in the sense that I think it can be a huge accelerant. I use it a lot, but I think we have to be very clear about like, we’re all changing our creative media and that’s going to do something to our creative practice. And I think the people who are worried about, especially AI art are totally on to something there.

Kanjun Qiu (58:41)

Mm-hmm. This is really interesting. The thing that I’m afraid to say when I talk about Imbue or AI agents is I think of us as trying to upend the economic system in a way and something about the way that things are working. Because fundamentally, AI systems and AI agents are a source of power. And they become that source of power by basically being the way thinking happens. And so what you just said is like, the system thought for me and now I have this thought, but like I didn’t have the intermediary thoughts before I got to this thought. And because I didn’t have those intermediary thoughts, I couldn’t get to a different thought. I only got to the end thought. So like, it’s quite concerning.

Geoffrey Litt (59:30)

It’s very concerning to me.

Kanjun Qiu (59:39)

I’m curious after you got the end thought, if you went back to the intermediary thoughts, like, okay, you got this bad UI from your LLM. Can you go back to the intermediary things and like try and understand it better, or can you truly not unsee it? Is there some property? Like I’m curious about your personal experience here.

Geoffrey Litt (59:55)

Yeah. I would say that for me, it’s a very emotional process. So it’s not just like a logical thing. There’s like an excitement factor, a momentum factor, you know, like, oh yeah, like we’re getting somewhere factor. And again, the tricky thing is that AI often really helps with this. Like it preserves momentum and avoids roadblocks that would have killed the vibe, you know? So like it’s great when it works.

But when it doesn’t, yeah, it’s not really about literally being able to unsee it. It’s that it changes my emotional relationship to the process in a way that makes me not as excited about doing it anymore.

Kanjun Qiu (1:00:42)

Yeah, okay, so maybe what happened is like it came up, it came out with a bad idea and it killed your momentum. And you’re like, I thought that there was something interesting here, but I guess not. Maybe I’ll move on to something else.

Geoffrey Litt (1:00:52)

Exactly. Yeah. And I’ll never know what if I had done it myself, would I have come up with something. And in fact, even when it comes to, in some ways, when it comes up with something good, it’s even worse because sometimes I’m like, this is pretty good. Like I have a few little tweaks, but good job. And then I’m like, wait, what would I have done, like would I have done better? I don’t know. And I’m not going to spend the time to figure out anymore. So, you know, I think one, one like mental model, I try to use is there are kind of things that I care more and less about that I see as more and less core to like who I am or what I work on. And so the less core it is to me, like some disposable secondary tool or something that I wouldn’t have built without AI in the first place, I’m okay just like being very free with it. But then the closer it gets to like my core practice, I feel like more of a urgency to be really critically reflecting on what’s going on and kind of not going too far, yeah.

Kanjun Qiu (1:01:57)

Yeah, it feels sometimes to me, I resonate with this a lot. I tried using like GPT 4.5 for writing and 4.5 was like the first model that was actually quite good at writing. And for like a month I was like really happy. I was like, oh my God, my writing process is amazing. Like I’m getting so many more ideas through like all in flow. There’s no writer’s block. And then I like zoomed out for a week, stopped working on the piece I was working on. I came back and I was like, wow, this is like not me at all. And I rewrote the whole thing, no LLMs. And the really interesting reflection, I feel like there are almost kind of like thought Schelling points. And because these systems are distributions, they actually produce the thought Schelling points that are highest likelihood. And because they’re high likelihood and they’re Schelling points, you kind of like just end up there and it’s like really hard to get out of them. They’re really tempting. Yeah, they pull you in exactly. And so now you kind of like end up in this like weird groove and it’s really actually hard to get out of it and be creative like takes stepping away.

Geoffrey Litt (1:03:14)

Yeah. I mean, another manifestation of this that I’m curious about with Sculptor is, I’ve been playing with parallel coding agents a bit and finding them really interesting. I’m still learning how to use them. Something I found recently, I went overboard for a day. So was like, oh my God, this is amazing. I had two projects that I was working on and I had to pick one to work on for the day. And I was like, you know what? I can do both. And so all day I was just kind of flip-flopping between these two projects.

You know, it kind of worked, but I felt really off at the end of the day. And what I realized was, man, like, I’m not sure that I did great work on either, because actually, even with perfect implementers doing stuff on both projects, it’s not like an AI coding quality challenge. It’s like a my mental bandwidth challenge. Like, if I’m really creatively leading these things, I can’t multitask, actually. And so there’s a different bottleneck, which is me and my brain. And I think like I’ve been trying to reflect on, so what do I do with that? You know, maybe I only parallelize within the same project or on the same area. Maybe I have one main thing I’m thinking about and then I have armies of bots doing all the maintenance and bug fixing and stuff I don’t have time for. I don’t know, but I’m curious your thoughts.

Kanjun Qiu (1:04:35)

That’s super interesting because yeah, I always recommend never work on two projects at the same time if you’re trying to do something creative with these agents, with these parallel agents. It’s interesting because I resonate with you, what you’re saying a lot. I think when you’re doing something creative or researchy with software, maybe at least for me, like what I’m trying to do is explore the space and evolve my thinking and understanding of the problem as I’m building. That’s like a very abstract, you know, yeah. But I’m evolving my understanding of the problem, of what I’m trying to do. And so with parallel agents, like, I don’t know, in Sculptor, one thing we’re trying to optimize for is divergence instead of convergence. So recently we shipped a feature. This is in beta right now. You can turn it on in the settings called Forking. You can Fork an agent. And so now you can like, you know, you like had this agent build a UI. You didn’t like the UI. Go back to where you started and be like try something totally different, like try this instead and then try also this. And it’ll like snapshot your agents current state all the context and like fork it into a bunch of different tasks And one thing I really like about this…

Geoffrey Litt (1:05:50)

Yeah, no, I’m excited. Keep going.

Kanjun Qiu (1:06:01)

One thing I really like about this is it kind of gets me out of this groove we were just talking about. There are many ways to end up in that groove. One way to end up in that groove is I’ve built up some context. I went down this path. I’m like maybe debugging some minor detail. Now I’m really annoyed because like all of this debugging context is in the context. I need to like get it back somehow. And like, I’m like in this weird groove. Another way to get into a weird groove is, it had a bad idea and now I can’t get it out. Like I can’t get back to the place where it could have generated good ideas. And so yeah, the forking thing is really about like, how do I help the user get out of grooves so that they can do really divergent thinking and divergent things and not have to try to wrestle with the agent to get the agent out of these grooves?

Geoffrey Litt (1:06:44)

I love that idea. I’m a huge fan of that way of thinking. And know, it’s funny, it’s coming back to version control actually. Like these questions of like, how do you structure divergence to work and like even see it and how do you encourage it is really, it’s like a tooling problem, I think. Yeah, I mean, something I’ve wondered about is also having more structure to the divergence. So what I mean by that is not just like try three random things, but let’s say I’m like, I want a to-do app. What if an agent said, okay, so one thing you can have the agent do is give you a big questionnaire, right? Like, should it be really simple or really complicated? Like, should it be for work or for personal life? And you just go through and answer these 10 questions or whatever. I think that’s what a lot of, that’s like the current state of the art, I would say, specification is answering a bunch of questions.

And it’s fine, but it’s pretty tedious. And it’s also, I think, not how real design processes often work best. Often the way things work best is by looking at a few and saying, like, I like that one, and then talking about why. So something I’ve thought about is, once the tokens are free, can you just generate like 100 to-do apps? But not randomly. So the agent would first think about, OK, what are the dimensions that the user might care about? Let’s set up a design space along those, you know, three, five, eight dimensions, whatever. Let’s take some guesses on what they might want and maybe pick some, bunch of app points in that space around there. And then it’s also try some wild cards, you know, really crazy options, pre-generate a hundred apps. And then when we come back to the user, we’re like, okay, let’s just start a conversation. Let’s show you some options. You want it to be more X, we had that ready already, you know? It sort of would be more like, jamming with like a design consultancy except the feedback loop is like in seconds and not weeks, you know? But like more playing with options.

Kanjun Qiu (1:08:50)

Yeah, I think this is really interesting. There are LLM tools out there. When you ask deep research to go do some research, the first thing it’ll do is ask you some questions about the query that you asked. And whenever it asks me these questions, my answer to all of the questions is like, yes to all. The questions are useless. And I was reflecting on why are these questions useless? It’s because it’s not actually questions I want to answer. It’s more, I want to see some output and be like, I didn’t like this part. I want more of this. What we want is for the LLM to like help us like understand the problem better. This goes back to what you and I were just saying about like the creative process is about understanding the problem space, what we’re trying to solve for or trying to do better as we go and like to be able to create and like move in that direction easily through the medium.

Geoffrey Litt (1:09:44)

Yeah, I love that you’re thinking about encouraging this in the tool because I totally agree with you that often, even if it’s technically possible to do this somehow, once you’re in the groove, kind of, you feel stuck unless it’s easy. There’s a couple, there’s a couple of beautiful systems out there that play with ideas of spatial canvases as ways to visualize that branching. So like in, not just the LLM chats, but also like creative media, there’s a system I love called Spellburst. My friend Tyler Angert and some folks at Stanford worked on it. It’s just a spatial canvas and you make these little like art sketches and then you can hit a button that makes a bunch of forks off from that one and you basically try a bunch of things and then you’re like, I like that one and then let’s diverge from there, right? And so you can kind of like explore but you see all the variations spreading out in this tree and I think that sort of thinking can be very generative.

Kanjun Qiu (1:10:39)

That’s super interesting. Yeah, we’ve struggled to figure out how to represent forking, like forked agents. And I don’t know if a canvas works when it’s not so visual. You can’t really visualize code very well. I want to see at a glance where each fork is going, but it’s really hard to do that with code. I’m curious when you thought about malleable software and helping the user not only make changes, but this goes back to the question of what kind of infrastructure, technical infrastructure allows the user to actually explore? I’m curious if you have any thoughts on this.

Geoffrey Litt (1:11:11)

The fascinating question. I agree with you that inherently visual media are a much more obvious fit for like a visual canvas or something. What you’re getting at really is how do you give the right feel for what a piece of code is in a concise visual way? I think one dimension that I’m always thinking about is it’s really hard to tell from looking at code how solid it is.

This is a big problem in malleable software we found in Patchwork because when a company ships a piece of software in the app store, there’s like a minimum bar it’s hitting, right? You hope. That might not be true anymore with LLMs honestly, but there’s someone’s charging money for it. It should be at a certain quality bar. If I just made a tool for myself and I vibe-coded it in five minutes and it works for me, what do you make of that? Do you want to use it? Like, it depends. Probably you wouldn’t want to just wholesale adopt if it’s really important. You might be okay playing with it. But it’s hard to tell often from the outside which one it is. And is this thing even maintained? You know, I think people look at like GitHub stars and commit histories, for example, as like a these sorts of signals of life. right. I think if we have way more software and it can be produced way more readily by people who don’t know what they’re doing, you know, there’s gonna be more bad software out there, which is not necessarily a problem. There’s a lot of bad spreadsheets out there and it’s fine, but I think you need to be able to tell. I kind of wish you could see like, you know, an analogy I like is, you know, is this like a balsa wood model of a bridge? Or is this like the Golden Gate frickin’ bridge? Like in physical media, it’s really, really obvious and you can never get confused, but with software, it’s not as clear and could we make that clearer somehow?

Kanjun Qiu (1:13:03)

This is really interesting. One of the prototypes we’ve been playing around with in Sculptor is this idea of a report card for your code. So can we give you, yeah, like, I think it’s actually a really hard, I mean, having worked on it, it’s a really hard problem to answer. What goes into the report card? How do know if a piece of code is robust? Depending on what you want to do with it, like, you might want it to be robust in different ways. Maybe you want it to be more extensible, or maybe you want it to be, like, really well tested. so, but, you know, to your question of, if I’m in this world of malleable software, there’s a lot of forking, there’s a lot of divergence. How do know where to build from? What is safe to build from? What are the proxies for that? I think that’s a really good question.

Geoffrey Litt (1:13:45)

I’ll throw another idea, which is I wonder, one of my beliefs around divergence and versioning is that there’s a lot of meta work around the work that humans find tedious, like writing pull request descriptions. And getting a pull request that has a really, really good description is much easier to review, but it’s a lot of work to produce that.

I think AIs, we should be pushing much harder than we are to produce amazing review experiences and artifacts. It’s the kind of thing, you know, we should be the most spoiled managers in the world where like our reports are coming to us with like these, they’ve spent like weeks preparing this presentation about this tiny bug fix. That doesn’t really matter, you know, is the feeling we can have, like we can spend the virtual time for them to do that.

And whether that’s, you know, coming up with like maybe, yeah, if it if it makes 100 apps for me, maybe it should make like a 3D world where I can like browse the different apps and see how they’re different from one another. Maybe it’s just like a really, really good deck that like walks me through them and explains the differences as like a PDF. I don’t know. Like if we gave like a high school or college intern the task of explaining these 100 apps, what would they do? I don’t know. But I think they can get extremely creative.

There’s one project I love called Quickpose, I think it’s called that by Eric Rawn, where like they did versioning on a spatial canvas, but it was more like a whiteboard that you got to draw on and arrange the versions yourself free form. So they found the artists would use it. I think they did some tests with artists and you it would be like a cluster of versions over in this corner is like the ones that had this intriguing property. And then there was this offshoot over there was really weird. And then this is like our mainline exploration and you can like actually arrange them and label them and describe them yourself. So imagine like if the AIs is diverging, could it like make a poster for you of what it tried and how it all fits together?

Kanjun Qiu (1:15:48)

That’s super interesting. Two really interesting things in that, one is the malleability of this canvas where artists could arrange the versions themselves is in itself a malleable software property where you want the end user to be able to explore by arranging the explorations and reasoning and thinking through them. So that’s really interesting. The other thing that you said that I thought was really interesting is this idea that we’re really under focused on presenting results from the AI. Like the LLM just dumps a bunch of text at you. Why has it not thought about how to present it as a slideshow or as like a, you know, as a better presentation. That’s really weird.

Geoffrey Litt (1:16:33)

It’s wild. I also think, you know, so I think this is going to become much, much more important very rapidly because so there’s one argument you could make, which is like, you know, the answer is to get so good, we don’t need to review. But I think that’s totally false. Because what I’ve observed in coding is they get better. So I give them harder stuff. And in fact, it’s almost the opposite problem where I’m giving them more and more stuff that’s more and more important that review becomes and they’re going off and doing stuff that I’m not even in the details on anymore. So the review step becomes more more critical over time. And I think it’s headed towards the world where most of my time is reviewing. So really, the quality of that review experience, whatever it is, to let me quickly, happily, and correctly tell, is this good? And what do I want to change? I think that’s the whole ballgame for interfaces for using these things.

Kanjun Qiu (1:17:29)

I think that’s really interesting, and I really agree. I would actually, a way I’ve been thinking about it in my mind is like switching from the term review, which is part of this like industrial process of software, as you said earlier, I love that term, switching from the idea of this as review to the idea of this as like being part of the medium of working with coding agents or AI agents. Like review is part of a powerful medium for agents because the medium doesn’t really work without this step of understanding what is going on and where to steer it next. And as you said, like, yeah, the more we use it, the more critical things we give it. And the more important this piece of the medium is, it feels like.

Geoffrey Litt (1:18:14)

Yeah, I guess I love that reframe. I think like you could think of it two ways, right? One is a human to human analogy of like we’re jamming and like it’s not like a quality assurance step. It’s more like we’re working together. You know, I wouldn’t like frame like if you brought me some work and we were working, I wouldn’t feel okay, like time to like check if you did what I said, it’d be like, you know, we’re riffing, right? So maybe that’s one way to think about it. But another is something about the ideas we’re talking about are visualizing, having thinking about non-human interactions. If a potter is forming a piece of clay into a pot, they’re not reviewing suggested pots. There’s just a loop going where the clay is becoming something, and they’re reacting live, and there’s a loop going. And I almost wonder if we could get to the point where crafting software feels that way, where there’s some representation you’re working with that feels like you can just directly touch it.

And it’s not a language interaction. You more see it coming together and you’re like, yeah, you’re plucking it into a pot. And I think that’s very, very obviously possible for shallow UI design. like, I should be able to move UI elements around. I shouldn’t be telling a model like, please move that box three pixels. You know, that’s ridiculous. And that hopefully will get solved. Although our civilization has taken backwards progress on that since the 90s. But the harder question is like, What is that for logic? I don’t know.

Kanjun Qiu (1:19:52)

Ugh, this is like the thing I want most. How do we turn software into clay? I don’t know. Yeah, what do you think about that for logic? It is kind of gets at Bret Victor’s Dynamicland, like how do we tactically feel what’s happening in software?

Geoffrey Litt (1:20:09)

Yeah, since you mentioned Bret, Bret has this great essay though, Up and Down the Ladder of Abstraction, which I think has a lot to say about how do you see this map of a very complex space and navigate through it to find the place you want to be in that map? I think that’s a really beautiful idea that could be brought to what’s my map of my 100 to-do apps and how do I find the one that I want? Going up into very abstract land and then jumping down into like demoing concrete ones.

Another inspiration that I think about a lot is Michael Nielsen’s work on AI, I forget, I guess it’s like artificial intelligence augmentation, think is, yeah, where, you know, I think he and Sean Carter had this piece with these sliders which are changing very deep conceptual attributes of a font typeface that an AI, I think learned probably with an unsupervised learning algorithm or something. Basically, I think it was pretty simple. You are moving in latent space of fonts with a slider. And that does make me wonder, what is the equivalent of that for code?

Could you do some of the Anthropic steering vector stuff on code gen and then you choose the steering vector? Is there a complicated slider that you can just drag and the app gets more complicated or simpler? I don’t know.

Kanjun Qiu (1:21:46)

It’s really interesting. It makes me think about, if you combine the slider idea and the abstraction ladder idea, you kind of drag the slider up and down the levels of abstraction. You can modify at each level of abstraction, and then the full abstraction ladder is regenerated for the new app. And now you can move down or up again so you’re like moving at whatever level you want. So like, okay, I want to actually like totally change this to do app in order to be this way versus like, I want to change this like tiny function in this to do app to like do this slightly different thing instead.

Geoffrey Litt (1:22:24)

I love that. I think this is a very ambitious vision we’re sketching out here. But yeah, I think this is a very different path than the one that the industry’s on right now, which is mostly just a bunch of natural language in the groove chats.

Kanjun Qiu (1:22:29)

Yes. I’m curious, you know, looking back at the research that you’ve done and where you are now, what insights do you feel like may be overlooked right now? Or like, what kinds of things do you think people are not paying attention to that they should pay more attention to to get the future that is more empowering, agentic, free?

Geoffrey Litt (1:23:05)

I would come back to the infrastructure piece of malleability. I think that people are, obviously everyone’s excited about AI making professional software development more productive. And I think some people are excited about personal tooling around AI building software for us. I think people haven’t realized how much the existing ecosystem is not prepared to support that. And when you think about the most basic questions like, if I wanted to add a feature to Airbnb, what would I do? People are like, wait, what? That doesn’t even...

Kanjun Qiu (1:24:46)

I can’t even compute that.

Geoffrey Litt (1:24:47)

I wrote this little tongue in cheek story essay thing where it was a conversation. was like an imaginary conversation with this mysterious wizard. And the mysterious wizard says, I want to schedule a weekly seminar, so it’s on my calendar and I just want to like figure out how many attendees there are and order the right number of pizzas for them automatically by Uber Eats. And this apprentice person is like I’ll just vibe code a new app that does it. And then the wizard is like, no, no, no. Like, can you add the button to Uber Eats, please? And the apprentice’s brain is like, pff, I don’t know, I can’t add that button. What are you talking about? So think there’s just like this really like deep change of perspective that hasn’t fully been internalized. When you bring the cost of editing code down by as much as it’s coming down, what makes sense to build around that is just not on people’s radar, I think.

Kanjun Qiu (1:24:43)

Yeah, what do you think? If you could list out every piece of infrastructure, this is awesome because it’s something that we’re thinking about. A way I think about Imbue is we want to build the public infrastructure that’s necessary for malleable personal software. So what is that? So if you had a wish list, what’s on the wish list?

Geoffrey Litt (1:25:00)

I think it’s a lot of what we talked about. mean, it’s like so many of the ideas, it’s all software ships with the editor for that software. It’s all software is live modifiable locally. The live modifications can be instantly live shared with your collaborators. But there’s awesome version control so you can diverge and converge as needed. You have really awesome data infrastructure that’s really easy for random individuals to run and doesn’t require corporate skill and enables modern collaborative apps.

A lot of those elements you could imagine coming together and essentially some sort of new operating system or platform. I think over time, I expect that the pressure towards personal software will be strong enough that we’ll start to see this emerge in some form. But I don’t know quite how.

Kanjun Qiu (1:25:51)

That’s really interesting. It’s really interesting because when we were building Sculptor, one of the things that we’re thinking about is, what if the software you’re building ships with some Sculptor environment so that the end user can edit it and see the live edits in real time and share the code base with someone? And something like that. It’s not quite there. I’m not quite sure how to do it because the version control problem is really hard. The data versioning problem is also really hard.

Geoffrey Litt (1:26:23)

Yeah, I mean, think who pays for the AI edits that are going to happen from the users? Like currently, AI code editing is economically viable because a lot of people doing it a lot are making software that ships to millions of people so they can get paid a lot to do it. How does that work? I mean, think there are like, there’s a lot of questions.

Kanjun Qiu (1:26:46)

For the infrastructure piece of malleability, is there anything in terms of cultural norms or the way that we think about software or the way that people exist or communities exist that you think either are changing or need to change as we go into this future?

Geoffrey Litt (1:27:08)

I’m really glad you asked that. This is actually one of the reasons I care most about malleable software. It has less to do with software and more to do with how people feel about their relationship to the world. There’s a Steve Jobs quote that I love that goes something like, the moment you realize that the people who made all this stuff were no smarter than you is when you can start actually changing things.

Kanjun Qiu (1:27:29)

Everything around you can be changed.

Geoffrey Litt (1:27:31)

Yeah, I think that’s a really, powerful mindset. And that is a mindset that’s cultivated in people in response to an environment and in iteration with an environment. think like disempowering environments create disempowered people who have learned to be helpless. And I think there’s like a general trend where, you know, narrowly, like you could look at examples of like cars have become a lot harder to understand.

You can’t really take apart an iPhone. There’s less comprehension possible in the world. And I also think software, because it’s not malleable typically, the more time we spend in digital environments, the more time we spend in these places that we think of as prefab corporate environments. The thoughts of what to change doesn’t even occur to us. I don’t think about how should we decorate this podcast meeting room that we’re in, because I can’t change it.

Kanjun Qiu (1:28:30)

We’re just consumers.

Geoffrey Litt (1:28:32)

We’re just consumers. I think the more time you spend in places that cultivate that mindset, the harder it gets to have agency in the world. So I think there’s a double risk here, which is that with AI, think some of the conversations we’ve had around being in the details and understanding things, if there’s less of a need to even understand things, to minimally get through your life, I think it ties into this general trend of there’s a revealed preference towards convenience and we all, myself included, choose it often, but that can have long-term consequences both for ourselves individually and as a society. And so I do think it’s one way we can work on this, I think, is make software a place where people can exercise their will more and they’re encouraged to do that. And I think that’s maybe one way to stem this tide a bit and may perhaps even start a virtuous spiral where kids come to feel that they can do anything and they will be right because they can.

Kanjun Qiu (1:29:37)

I love that. I resonate with that a lot. Our digital worlds right now are disempowering environments. We are at the mercy of them in a lot of ways. We can’t really change them very well. We don’t have that much agency. And because we spend so much of our lives digitally, we end up feeling disempowered in our lives. so it feels like as we go into a world with AI agents, this can get worse. These are agents run by other people with their incentives that are now taking actions on our behalf. And so it’s even more disempowering, some of the things that we talked about. The core cultural shift is how do we go, you know, how do we rekindle the original idea of the personal computer, the original dream of like these systems as like, manifestors of our will and what we want in our lives.

Geoffrey Litt (1:30:27)

Exactly. I think that’s a great closing note in a way to bring up that original vision of, you know, people like Douglas Engelbart and Alan Kay, their original vision was the personal computer is precisely this empowering thing. That’s why it’s personal. And so I think if we can find ways to get back to that, the world will be a better place.

Kanjun Qiu (1:30:47)

Cool. Well, I think it’s possible. I think we’re at this turning point right now where software can become personal and we do need these pieces of infrastructure that make it possible. like we could change the economic incentives around it because now everything is so replicable. And so we’re at a good time.

Geoffrey Litt (1:31:09)

I agree. Let’s do it. I just joined Notion, but Notion is one of the players trying to make it happen, right? Like, build that platform. I think there’s, it’s not going to be one winner. I think there’s going be many platforms that enable this philosophy of personal software in so many different arenas. So yeah, I’m excited. Let’s do it.

Kanjun Qiu (1:31:23)

I agree. Awesome. Well, thank you so much, Geoffrey. This was really fun and a great meandering dive into all of these different ideas. So I really appreciate it.

Geoffrey Litt (1:31:32)

Thank you.

Subscribe now

Choices and Knives

glenn mcdonald — Sun, 21 Sep 2025 04:58:00 GMT

This piece was originally published on Glenn’s blog, Furia.

The consumptive future that billionaires and their power-consolidation corporations are trying to sell us is a future of vending machines.

The collective future that we want is a future of kitchens.

Vending machines present a shallow illusion of overconstrained "choice", but no meaningful agency. The insidiousness of vending machines, however, is not this quantitative overconstraint. Adding more products to the vending machine does not change its nature. Amazon is a very large vending machine. The insidiousness is the conflation of choice with agency, and thus of consumption with participation.

A vended economy is supported by a complaisant vended politics, increasingly of its own making. Voting is the nominal structural basis of democracy, but encouraging people to vote, in itself, is anodyne and power-friendly. Voting in elections with only one candidate is bad theater, but voting in elections with only two candidates is only one better. And the math never improves: if you have no control over the candidates, and in particular if the candidates are never you, increasing their number doesn't benefit you.

We don't immediately recognize this as dystopia, because dystopian story-telling usually oversimplifies the format of the oppression. In 1984 the government controls the meager supply of drab consumer goods, and the single broadcast/surveillance channel. But of course Walmart has endless aisles of things, and the TVs in the TV aisle have endless channels. We have choices.

But our choices are stocked by Walmart. Or Amazon, or a handful of morally interchangeable competitors. The TV channels are numerous in frequency, but monotonous in signal, and monopolized in control. You can have any innocuous flavor of filler surrounding your advertisements (for innocuously flavored fillers). Orwell thought this gray-goo of a world would be imposed on us by the State, but the capitalist innovation is to invert this. Power is consolidated by money, not vice versa. Citizenship is portioned into voters, who are then repackaged as consumers. Anything functional in government is absorbed or disassembled until it can impose no miserly restrictions.

And so we get: self-destructive grievant feudalism wielded by a petulant debt-powered narcissist, supported by gutless symbiosis with a solipsistic social class of robber barons. The narcissist only sees himself in the (legacy) media, which is controlled by the same fear-sellers who picked him as their sacrificial agent. Dissent isn't so much crushed as is organized into slots, each of which are manipulated to go temporarily out of stock, and then those empty slots are filled with something more colorful, but more completely owned. A few protesters gather in front of the machine to demand the return of the most recently discontinued snack. Their attention validates the machine. The machine gleams eagerly, its buttons patiently awaiting their fingers. They are angry now, but anger turns into hunger over so little time. Soon they will want something. The machine has the things. It waits.

We have kitchens. So many of the things in the kitchen came from rows on shelves in stores, vended with only slightly less structure than from the machine. The kitchen is not anti-business or anti-capitalist, exactly. But the things in the kitchen are material, and tools. The difference between a 5lb sack of flour and an individually-packaged snack cake is the difference between potential energy and the bill for energy consumed. At the end, we still eat. The difference is not the overall topology of the system, but our place in it. We come to the kitchen to take up knives, not coins or tokens.

Instead of a flat grid of processed choices, all lit from a consistent angle, the kitchen is an unruly space. Most of the things in it do fairly little of their own accord. A few of them have very particular purposes, but many do not. A bagel-cutter, but then 4 knives for all other needs combined. One adorable pan you use twice a year to make æbelskivers, two sizes of skillet, a saucepan, a soup pot. Inspirational cookbooks you mostly don't actually open. Turn on the stove; grab a pan, put a little olive oil in it; get a knife. We're going to make something out of ingredients and tradition and imagination and love and heat and garlic.

But even when we do, we are mostly alone. The capitalist rendition of Community Supported Agriculture is a telling example of both potential and challenge. The farmer solicits patrons, who each subscribe to a share of the farm's output, driven conveniently from the farm into the city every week. But while the community's support is collectively tangible to the farmer, albeit not regally so, the community is mostly only abstract and implicit to itself. Maybe you say hi to other people picking up their shares at the same time. Maybe there's a mailing list where you can exchange ideas for what to do with 8 zucchini at once. But there's probably no shared kitchen where you could all make zucchini chocolate cake at once. The community of the CSA, in isolation, is not only asymmetrical, but inherently hard to manifest. Most of your actual neighbors aren't CSA subscribers. Half of them only shop at Trader Joe's and think you're making a gross joke with the thing about zucchini in cake. Some of them shop at the part of Amazon that says Whole Foods, and at least cook. One of them belongs to a different CSA. These fragmented micro-collectives don't worry the billionaires. You are more likely connected to your immediate neighbors by baseball. The billionaires own the baseball teams.

The "smart" phone, which is now just what "phone" mostly means (in the same way that "social" media is now just what "media" mostly means; and thus a few more billionaires), is sometimes dreamily described as a computer in our pocket, but of course what it really is is a vending machine in our pocket, neatly lined with buttons. Behind the buttons, increasingly, are "apps" that are themselves in turn essentially vending machines of prepackaged choices. Like a tapas dinner with our friends, this doesn't sound bad by definition. But the tapas restaurant has a kitchen, and knives. A decent restaurant is a complex celebration of human agency put into the form of edible performance, and should make us want to cook in the way that, hopefully, a good song makes us want to sing. We can sing while we cook.

Our computers, even the tiny galley computers in our pockets, can be more like kitchens full of singing. The thing that would make them different would be a different kind of software. But the thing that would make a different kind of software likely is a different economy and a different social structure of not just how software is made, but how computation is applied to human problems. Power must be distributed, but we also have to want it, have to want to make our own decisions instead of delegating them to our choice of 5 omniscient oracles. The oracles aren't going to tell us to figure it out ourselves, so we have to want to not ask them. A CSA doesn't require that its recipients eat differently than TJ's customers, but 8 zucchini constitute a provocation to cook in a way that a frozen stir-fry does not. If the only "ingredients" we can easily buy are frozen entrées, all meals are snacks and it doesn't matter if our knives are sharp. Especially not if the snack troopers come for our knives, pretending it's for our safety.

We do not need more applications. We do not need new vendors of fancier and less predictable ways to make the same snack apps. We need the same things for data and data-tools and the things we can make out of data with data-tools that a community needs from a communal kitchen. But because communities formed in computational space can use tools for self-development and self-determination that would be harder to provide in physical space, maybe examples in our software can help lead to similarly catalytic ideas in our cities. Maybe data maker-spaces will inspire us to make communal kitchens, and the kitchens will give us new data needs, and thus new ideas. The spaces are different, but the communities are all made of people, and the people are us.

We do not need more snacks. We do not need robots that make more snacks. We do not need machines that turn our zucchini into snack cakes that they then confiscate and sell back to us. We need a place where, when we get hungry, it is still harder to reach for a knife than a button, but only in the way that tells us the results will be satisfying. A place in which the vending machine gathers dust until we replace it with an extra refrigerator. A place with noise and joy and knives.

We do not need safety from ourselves. The knives are not weapons. Stabbing is not a cooking technique. The newly unbillionaired can have some zucchini chocolate cake, too. This is our argument against oligarchy and our restorative consolation to those who thought safety required demonization: We have enough. Dominance is a rich person's poor substitute for collaboration. Aspiring to dominance is a poor person's poor substitute for working together on our collective wealth and taste.

We do not have to settle for poor choices, bought and swallowed whole. We do not have to buy what we find in machines. We do not have to quietly comply with our own commodification. Together, we do not have to be consumed.

From lawless spaces to true liberty: rethinking AI's role in society

Kanjun Qiu — Tue, 05 Aug 2025 22:56:27 GMT

Welcome back to Generally Intelligent! We’re excited to relaunch this podcast on Substack, and in video. Our episodes still feature thoughtful conversations on building AI, but with an expanded lens on its economic, societal, political, and human impacts.

Matt Boulos leads policy and safety at Imbue, where he shapes the responsible development of AI coding tools that make software creation broadly accessible. His work centers on understanding what technological power means for individual liberty and advocates for the legal and institutional frameworks we need to protect our freedom. Matt is a lawyer, computer scientist, and founder.

In this conversation, Matt and Kanjun discuss:

AI’s four core challenges
1. Empowering bad actors
2. Transferring power from labor to capital
3. Reducing resistibility
4. Psychic damage of disempowerment

Governing lawless digital spaces
Why abundance is not enough without liberty
Freedom as deep enablement and deep protection
The role of technologists in shaping society

Timestamps

03:13 The complex landscape of AI conversations

06:11 Understanding AI's core challenges

08:57 The transfer of power from labor to capital

11:51 Resistibility and human agency

15:00 The dual nature of technology

18:01 The invisible dynamics of digital spaces

23:51 Lawless spaces

27:01 The future of work and economic stability

40:05 Privacy laws and digital rights

44:07 Code as regulator

54:12 Interoperability and user control

01:07:05 Aggregates vs. individuals

01:14:43 Bottom-up vs. top-down automation

01:20:49 Optimizing for increased ability rather than increased productivity

01:23:11 Economic implications of AI

01:26:54 Building systems for empowerment

01:29:22 Freedom as deep enablement and deep protection

Transcript

Kanjun Qiu (00:21)

Welcome back to Generally Intelligent. My name is Kanjun Qiu. I'm the CEO of Imbue. And we have with us Matt Boulos, our Head of Policy.

We started this podcast back in 2020 when we were trying to understand from researchers how far this generation of LLMs would go. The podcast has succeeded far beyond what we expected. Many of our early guests went on to have huge impacts on the field, and AI has gone from this niche thing to a household name everyone's talked about.

But since it's become so ubiquitous, we've started to realize something strange. The conversations in public are really weird. We have one AI CEO saying they're going to replace all of our jobs, but they're distributing intelligence, so that's good. And another AI CEO who's worried that it's going to kill us all, but it'll also give us tutors in India and new medicines, and so that's okay.

But where is the serious conversation about the real costs and benefits of this technology, the real economic, societal, political, and very human impacts that AI is going to have on our lives?

Generally Intelligent—this podcast and this conversation—is the start of that. We want this to be a space for us to have serious cross-disciplinary conversations about AI so that we can make changes. We can talk about different economic mechanisms, different ways to build technology, so that we can create the future that we want.

Because today, it's not too late. We can still change how this technology shapes society. And if we wait too many years, that's not going to be the case anymore.

So let's dive in.

Matt Boulos (02:45)

You've put a lot of thought into thinking about what are the core challenges that AI brings. Why don't you walk us through what you think are what you see as the main areas that we need to take seriously if we're going to address AI's impacts?

Kanjun Qiu (03:44)

Sometimes it's so overwhelming because people talk about all of these different problems as a whole smorgasbord from sycophancy to how AI might take all of our jobs to how it might take over the world. So, a way that I think about the problems is to bucket them into four categories based on the mechanism of action by which the system is acting.

One, empowerment of bad actors. The core mechanism is, the power of actors who might do damage goes up. It's a technology that gives a lot more capability, and now various people who couldn't wield this capability before can.

And I actually lump both AI takeover—AI systems taking over and dominating humans—as well as terrorism in that category, because the mechanism of action is the same. If AI is taking over, that just means that AI is taking a lot of this power and then doing negative things to humans. And same with terrorists or authoritarian governments.

The reason why it's helpful to think about that mechanism of action is that it's very generative for solutions. When I think about actors who are anti-social, in the solution space, there are a couple of things I can do. One, I can prevent anti-social actors from getting that power. Let's look at which actors exist—governments, individuals, the AI systems themselves—and then look at how we can prevent them from getting power. That might be all forms of know your customer laws or safety research or things like that.

On the flip side, another way to make things more resilient is to make the world safer to bad actions like this. Maybe in that camp is better surveillance of the creation of biological artifacts so that we can prevent viruses, or inventing a universal antiviral that would actually remove a whole class of dangerous problems. I actually think this category is very well talked about often and the important thing is that it is just one category and that many solutions actually solve for many different of these actors and the problems that they pose.

Kanjun Qiu (06:47)

The second category I think of as transferring power from labor to capital, the capital-L Labor to capital-C Capital in the Marx view. As labor becomes less powerful because we are less valuable and capital gains power, what happens?

Most of us are in the labor class. We do not own the factors of production. We work for wages. And this is a technology that’s starting to do things that we currently do wage work for. So what happens to all of us who work for wages?

There's the immediate, somewhat alarming effect of that: losing jobs. But there's, to me, the long-term, somewhat alarming effect of this, which is that you have this constant power transfer from labor to capital that is forever.

Matt Boulos (07:51)

There is something really quite striking if the ability to be productive depends on capital. This is a really abstract way of saying, I show up to work and the capital I’m bringing is my laptop, but for the most part, I'm bringing the labor. Imagining this world, maybe the day's gonna come that the laptop matters way more than I do, and it’s a question of who owns it.

Kanjun Qiu (08:41)

That's a really good way of putting it: the transfer of power from labor to capital is equivalent to the transfer of usefulness from me to my laptop. So what happens in a world where the laptop's way more useful than I am?

Matt Boulos (08:54)

I've never looked suspiciously at this thing before.

Kanjun Qiu (08:57)

What happens in that world is not just economic. It's not just that I get paid less. Maybe I wield my laptop so therefore I still continue to get paid some. But in theory, the company owns my laptop, so I may not get paid at all.

But the second effect of it is political. Part of the reason why we have political power is because our government depends on us to fund it. And there are a lot of countries that don't depend on humans. They depend on natural resources like natural gas or oil: the UAE, Russia. And they have a lot less incentive to treat their people well in the same way as we maybe in America do. So I'm somewhat concerned about the kind of loss of political power that we'll have because of our loss of economic power.

Capital can now just use capital—use AI—to produce more capital, and there's this reinforcing loop.

Kanjun Qiu (09:59)

The third category, which I haven't heard that many people talk about, is your idea of resistibility. In political philosophy, there's this idea of resistibility: how well can you resist laws that don't serve you?

In America, we have fairly high resistibility. The civil rights movement was a good example of that, where you could actually disobey, have civil disobedience, and then change the laws. There are countries that have very low resistibility, like China as a surveillance state. And one thing that we're concerned about is going into a future where the resistibility of humans against automated systems—either controlled by themselves or controlled by other people—is much lower. We lose our power. So, the core mechanism here is a transfer of power from people to automated systems and the people who control them.

There are a lot of examples of low resistibility today. For example, we have very little ability to resist our social media notifications. We can turn them off, but we also have very little ability to resist our social media algorithms or news algorithms or control the news that we see, that we want to see. There are ways of opting out, but I would consider it a fairly low resistibility environment.

And as we go into a future that has a lot more automated systems—agents that are doing things automatically—that's something that's really important to consider. Now, other people are going to have agents that do things like spam call you constantly, or try to convince you on a website to buy something you don't need, or try to convince you to give them data that they can resell. Especially given what we see about current capabilities, it’s not clear that we have anything in place that addresses that.

The fourth category is how it affects us as people to live in a society where we don't have very much power—we don't have power economically; we don't have power to resist things. We end up disempowered and, in the best case, infantilized.

That is scary because there is a deep sense of learned helplessness that happens as we lose power. There's a great study by Lisa Kahn about how college grads who graduate in a recession have lower wages for the rest of their lives relative to college grads who graduate just a year or two after, which is super crazy. You term this “psychic damage,” which I really like: damaging our own perception of how capable we are and can be in the world. And I think this is really sad. There's this spiritual damage that we don't talk about, which is about human potential, about what people can be. What we want is for AI to expand our potential and expand what humans and humanity can be, but there are all of these effects that, in this default path that we're on, seems like it's going to go against that.

Matt Boulos (13:21)

I want to run with the last thing you said because... I guess I have to come out with this. I'm old enough that I have a memory of my life that's pre-internet.

Kanjun

That is old.

Matt:

I remember being in grade school and Mr. Gen, the computer teacher, knew that I was really bored, so he would pull me out of class and then we would pretend that I was learning, but we were just trying out new software. One day he's like, “Let me tell you about the thing called bulletin board systems. There's somebody else on this computer.” I’m like, “Where are they?” And he's like, “They're in another country!” It was wild and so hopeful.

I think often these days about my parents calling their parents after they immigrated to Canada. They'd get these crap calling cards that would cut out and were choppy and super expensive. Now, I FaceTime my mom and she's like, “I have to go now.” I'm like, “how about we hang on for a little while longer?” We're taking for granted the ability to see each other, hear each other.

So you have all this incredible potential, and it's beautiful, and it's real. Our machines can augment us and we love tools. I don't understand why you could say, “I love my pen that I write with, but I don't like my laptop.”

But at the same time, we know that technology has been this very mixed force in our lives, from the capacity to surveil people to predatory mechanisms around how we communicate. One of the things that I have felt is really important to bring to the conversation is that talking about AI as good or bad is almost silly. It's like saying trees are good or bad. You plant it right next to your foundation, that was a bad move; you step out into a clean city, and it's the most wonderful thing. There are, of course, limits to that analogy, but there is something really profound about taking the complexity seriously.

Kanjun Qiu (15:47)

I was really struck by a quote that you said many years ago where you had just read something by Marshall McLuhan, and you rephrased what he said: that we adopt technology for its benefits and then we suffer its consequences. So, the important thing is to think about those consequences that we might suffer and see if we can get more of the benefits and less of the suffering.

Matt Boulos (16:17)

What was brilliant about McLuhan was that he'd clued into this dynamic where we adopt a technology and that it changes how we work and interact and our own capabilities so that it is no longer possible to detach and disentangle.

Kanjun Qiu (16:35)

We’re part of it; it's part of us.

Matt Boulos

Take something like social media. The public narrative on it is actually condescending and not correct. We're not just a bunch of dumb-dumbs who are sitting around swiping things because we don't have anything better to do—or at least not completely. The thing that's really happening is that our lives, our social lives, are on here. I get to see my friends' kids. I'm able to get a diet of things that matter to me or that entertain me. There's nothing at all wrong with that. We don't bear responsibility for the fact that these things are wildly addictive. But if our lives moved onto these platforms, then we're now in this really stuck position because if the platforms don't behave, then we are subject to that.

When I started law school, I decided to run an experiment. I had no phone, no computer; I had nothing. I'm like, “I'll be a nerdy monk and see how this one goes.” And one of the interesting things that happened was that I realized that nobody wants to call your landline to invite you to parties. Everyone wanted me to go to the parties, but they'd tell me about it the day after. They were like, “you weren't there!” “Well, you didn't invite me!” They're like, “oh, yeah.” Everyone was texting each other, and it was a simple thing. You can't just exit. That was significant. And then I got a phone.

Kanjun Qiu (18:01)

This is really important. You can't exit technology. We can't just be like the Amish because this technology is now so prevalent and so entwined in our social interactions and our lives.

Matt Boulos (18:19)

Let me give you the silliest example. My son is in a preschool that has a bit of Mandarin immersion, and I know no Mandarin at all. He's now singing “bá lǒ bò,” and it turns out there's this children's song about pulling radishes out of the ground. My son’s just marching around the house singing this song, and so we're like, “what's going on? Why did we enroll him in a language school that we don't understand?”

What do we do? We just hopped on an LLM and said, “okay, my son is singing this, can you tell me what this is?” All of a sudden, that world opens up, and it's beautiful. I want to challenge the idea that that is us conceding something. Why should I be trying not to do that?

Kanjun Qiu (18:56)

This is to something you said earlier, which is, when we get on social media platforms it’s very positive. But I was saying this earlier about resistibility, where it's very hard to resist these systems. What's going on here? It is something about the power inherent in technology.

Fundamentally, what is AI? It is doing computation. Computation is the same computation that our brains are doing. It's taking inputs, perceiving them, running them through some model of the world, outputting some things, and those outputs can be turned into actions.

There's something about social media where it is an AI agent. It's making decisions about what actions are being outputted, like what I see on my newsfeed. And as a result, I get different inputs.

So, I'm getting this other input or information that changes my model of the world. To your point about this symbiosis, the technology is making some decisions that is causing me to get different inputs into my world model and now my world model is getting morphed or transformed in a different direction. And that can be very positive. For example, you learn about your son and the song that he's singing and that expands your world model and helps you see reality more clearly. Perhaps the areas that I feel concerned about are the places where it actually kind of changes the way you see reality in a way that is more twisted.

Matt Boulos (21:00)

It feels manipulative. But I want to take a step back, because I've been like, foam finger, go technology, go. But this point about technology and power, thinking about its mechanisms are really important. Going back to my son singing his song: I went to this machine and asked it a question and it came back with an answer that I couldn't have gotten five years ago, or at least not so easily, not so smoothly.

But we're only talking about one part of that transaction. We're talking about me asking that and getting an answer. We're not talking about: Is this being logged? Does this system now know that my son is in a Mandarin immersion class? Are we gonna get like Mandarin worksheets offered up to us at the next interaction?

There's something about this digitally mediated world that is foreign and dangerous. And I think that's worth probing into.

Kanjun Qiu (22:11)

What do you think it is?

Matt Boulos (22:13)

I have a couple of different mental models. Let me play with two.

One is something that I call lawless spaces. Imagine a part of town where the police don't go. The rules, the norms—people are working them out, but they're not governed like the rest. And you cross that threshold into those places. Soon, things become possible. Maybe there's just a free spirit. It feels like the chaotic early days of the internet: people who relished in anonymity, not because they were up to trouble, but because there was something liberating in that. You can imagine that creating a heady atmosphere.

Kanjun Qiu (22:59)

Like the Wild West.

Matt Boulos (23:10)

Exactly. Your bank is like, “why shouldn't I be there too?” So your bank sets up shop, except when you step out of the bank holding a bag of coins, someone whacks you over the head and takes your coins. That space doesn't have the same rules, the same governance. It's not a perfect analogy, but there's a lot to be said for that.

I was talking to somebody about privacy and they're like, “That ship sailed.” And I said, “Well, why?” If we were having dinner and some dude comes up to you and just stands right next to you while you're talking and he's just writing down what you're saying, you give him a slap, send him out of the restaurant, right? We have that but we don't see it in the digital context, so we haven't learned to govern that.

Kanjun Qiu (23:47)

Do you think digital spaces are lawless because they're not visible?

Matt Boulos (23:51)

I think that's a huge part of it. The next thing I want to talk about is, what are the things that make the digital space really particular? One is that most of what happens in them is actually invisible to us. If I go to the neighborhood oracle and give them 10 bucks and say, “My son is singing this song, can you tell me what this is?”

He's gonna sit there, like, “Oh man, I know what it is!” He's gonna grunt and groan, write something down, and send me out. I go to an LLM and it’s a magic box. You and I may know how in theory it should work, but we don't actually know how it's implemented. We don't know what a year from now, five years from now is gonna happen.

Kanjun Qiu (24:44)

And you don't know what's being logged; you don't know what the company is doing with the data. There's a lot you can't see.

Matt Boulos (24:52)

There are really particular characteristics to just the digital world. It is easier to log than to not log.

Kanjun Qiu (25:00)

And it's safer in a lot of ways.

Matt Boulos

And there's an expectation. If something doesn't work, the customer is like, “I did X and it didn't work.” And you're like, “I have no idea what you did, I have no logs.” And they're like, “What are you doing? Are you junior grade developers?”

It is easy to log data. It is cheap to collect data. It is lucrative to collect data. Even before advanced models like LLMs, we could crunch through stupidly large amounts of data. So you have these reinforcing mechanisms that take us to really perverse outcomes.

We talk about surveillance. Surveillance enables an astonishing amount of bad stuff. Resistibility is something that you reach for when you're in conflict: something has gone wrong, and you have to resist it. But preceding that is legibility: do you even know who I am?

Kanjun Qiu (26:01)

This prompts for me: let's imagine we were making a world in which we're making everything digital into a physical manifestation. What I'm hearing you say is, we have ended up in this really weird default digital world, especially going into this AI future. It's weird because some defaults are weird. One default is that we log all data. A second default is that companies—I, running this company—can process that data however I want. Another weird thing is that now we have these AI systems and they can do lots of new magical things with that data. For example, take a photo and know where I am. As the person at the other end, I can't see any of it. And as a result, I actually don't have a mental model of it being a problem at all.

Matt Boulos (27:01)

We also didn't have an emotional response. We’re wired as human beings to recognize these things, and we can't react.

Kanjun Qiu (27:08)

We talk about something like surveillance and it's such an abstract concept. But if you were to turn surveillance into a physical manifestation, like this guy who is writing everything down in your conversation next to your table, then it would be like we have five people following us around everywhere and they're all logging different things about our lives and they're changing other stuff in our life based on what they're logging.

Matt Boulos (27:29)

This is where it starts to get wild because on one track, often when people are talking about the productivity benefits of AI and the labor impact, we're talking about labor substitution. But there's another way of thinking about the impact of AI within the labor context, which is that new work is being created. Let's take something like credit scores. Largely opaque systems, the financial services industry benefits from it, and the good faith argument is that we all benefit. If I'm an untrustworthy borrower, you shouldn't have to be paying rates to subsidize me, so we stratify on the basis of reliability or whatever terms they use to describe it, like credit worthiness.

But then you could start to shift the granularity of that. We could just collect all sorts of stuff. We could also experiment. We could collect data even if we’re not sure if it's relevant. Deny me a loan—who cares? I'm one data point. My life gets crushed, but they don't know about it because their system did it, and just move on and experiment. That dynamism that becomes possible is going to be potentially quite pernicious.

Kanjun Qiu (28:55)

When you say dynamism, what do you mean?

Matt Boulos (28:58)

You could have systems that are not stable anymore. There isn't a credit score. There's an algorithm that's constantly rewriting the rules. Why not? As long as it’s goal seeking against minimizing defaults, it doesn't matter how unfair it is. Often when we talk about unfairness—putting on my lawyer hat—we often talk about things like disparate impact, protected categories, that sort of thing. But what happens when it's arbitrary, what happens when it's large categories of society, what happens when it's not easily pinpointed? Again, the bad stuff is happening behind the veil, so we don't know.

I want to connect that to something earlier that you were talking about, when you were talking about the economic impacts, and you were saying, that destabilizes society. But also, when you live in a world where you are subject to all of these forces and you're helpless against them, it's not good for a person to feel that way. Think about the worst parts of childhood where adults are not taking you seriously, not letting you do something that you ought to be able to do, and then that becomes the dominant mode of adult life.

Kanjun Qiu (30:18)

It’s very disempowering.

Matt Boulos (30:22)

And this is preceding oppression. That by itself is destructive. And then you add to that malicious intent or malicious oversight, and it isn't a surprise that we live in an angry moment in our society. I don't have a lot of patience with the tech community sort of sitting around saying, “How could this be?” Well, I mean, you've been bloody architecting it for the last two decades. There is a reason why people feel disempowered because they are disempowered.

Kanjun Qiu (30:53)

They have no power to change a lot of things.

Matt Boulos (30:56)

How could you change any of these things? The thing with AI—and I think it's really important that we ground it—is that we have to recognize that all of these dynamics are in play. Then, we can ask how do you design, and how do you get to empowerment? Because we could also just sit here and be angry and walk away, but that's not going to help.

Kanjun Qiu (31:12)

Two things that came up as you talk about this. One is, narratives today about what kind of future is okay for humans. I think a lot of the futures that the tech industry talks about today are actually very disempowered. One type of future is like, we're going to live in a stable utopia where everyone's going to have anything at their fingertips and it's going to be okay. But it does not consider seriously these dynamics where people are being controlled by technology and the people who control technology.

A second thing that you pointed at was this notion of utopia being like “permanent undergrad” where you can be free and intellectually curious and it's really fun. But an undergrad is not an adult with the ability to fully manage their lives.

The kind of freedom that you're going for is for humans to be truly able to be fully adult and in the world themselves without being pushed upon by other forces, with the ability to push against those forces.

Matt Boulos (32:42)

Absolutely. What do we really want from our lives? It’s to be able to realize our capacities.

Kanjun Qiu (32:56)

And that involves growth and change and creation and being pushed down.

Matt Boulos (33:01)

Absolutely, and having a chance in all of it. One of the things that I've noticed within— again, not to take a piss on the tech community — but we'll talk about what is an ideal future, or what's an ideal life for someone to have, and that's just somebody projecting what they thought was interesting to them.

I have so many people in my life for whom the specifics of their job don't actually matter that much, as long as they can take care of their family and support their community in ways that are really meaningful to them. Those are rich, beautiful lives. And when the structures around a person erode that is when we start to see this real frustration emerge.

Kanjun Qiu (33:53)

People are frustrated because they feel like they don't have any levers to change the situation of their lives, and they don't like the situation they're in even though the world is abundant and they're fed. There's something missing about their sense of autonomy or freedom or their ability to make change. What I heard from lawless spaces is, it's partially a lack of legibility and partially a lack of levers of action. And if you had legibility for everyone and levers of action for everyone to be able to change their life circumstances, the institutions that aren't serving them, then maybe those two things would allow us to be able to have a little bit more autonomy and self-determination in our lives.

Matt Boulos (34:48)

It's kind of hard in our present moment to think about what a stable political or legal regime looks like in general, but there is a simple fact that we've, for centuries now, have figured out that it's not cool to steal somebody's money. It's not just that theft is wrong, but that the state can't do it, even if it's useful to the state. We say that's not right.

In conservative circles people talk a lot about debanking where just banks just turn you off digitally. It's not a frequent occurrence, but it happens, and has happened in response to political events. What's wild to me about it is that it just could not have been a thing a few decades ago; the bank would have to have literally stolen your money.

Kanjun Qiu (35:54)

Like a bank run.

Matt Boulos (35:56)

Or simply, you'd go to your bank branch and they’d be like, “we're not going to give you your money,” which is what debanking looks like. And you would say, “you stole my money!” Whereas debanking now is either just hitting a switch and they can access your money, or just saying, “here's your money, you're out of the financial system” in a way that is only possible in a digital world.

I have one belief that our laws and rules haven't caught up to digital reality, then AI accelerates digital reality to all of its conclusions.

Kanjun Qiu (36:32)

What I hear from what you're saying is, the digital world is enabling all these mechanisms like being able to turn off my access to my funds and the laws haven't caught up. The last 2,000 years of development in the legal system have been about physical reality.

That physical reality is actually happening in the digital world. You're giving all these physical analogs to it that are really interesting because they let us see the physical reality of what's happening, but somehow we haven't mapped that physical reality to the digital world. What would be required to make lawless spaces more lawful? Why have we not caught up? Is it a lack of knowledge? Is it the lack of visceral sense of what's going on?

Matt Boulos (37:37)

Each of the things you said feels to me like it's playing a part. We both understand computers really well, but when I hop on a website, it is not occurring to me that they are tracking the things that I'm doing.

Kanjun Qiu (38:05)

True! The other day I hit ‘accept cookies’ and then I was like, “what happens when I accept cookies? Oh shit, it can track me across multiple websites — that's crazy!”

Matt Boulos (38:13)

I drive people nuts when they look over my shoulder, because I always not only reject cookies, but I open the thing to make a point of deselecting everything. And the hilarity is, often these are just pop-ups that don't do anything and it just collects your data anyway.

Kanjun Qiu

That's kind of depressing. Thanks.

Matt Boulos

Yeah, it really is. You're welcome.

There's one sense that it's not tangible, no matter how sophisticated you are.

The other thing is, it is new. In world historical terms, we're talking about living in this regime for 10 years. It is not that long, right? Google trying to figure out how to monetize was something that happened basically in our adulthood. That's nuts. And then going from web to mobile, the introduction of apps — all this has happened really, really fast. Part of it is we haven't caught up.

The other somewhat more cynical thing is that, it turns out lawless spaces are awesome because they're so lucrative. If you can do stuff like surveil people and track them and price fix and all of the rest, you can do all sorts of astonishing things.

If you deal with it now, it's a lot easier than down the line. One easy answer is a good privacy law. We had good privacy rules 10-15 years ago. It wouldn't be so painful for the large tech platforms to unwind these privacy practices.

Kanjun Qiu (39:49)

But now it's entrenched. You have to change your entire infrastructure.

Matt Boulos (40:08)

We're talking infrastructure, business models, identity as an entity, and the market cap of these things. I don't want to grant sympathy to the surveillance practices, but this is a huge thing that we're going to have to ask of them. But we do have to ask it.

But there is an interesting question of what rights do we have that we have failed to translate, just as a practical matter? We already have these legal rights and we haven't brought them to these spaces. And then what are the new things that we have to figure out?

Larry Lessig's notion of code as regulator is really fun. What he does in this setup is that he points out that in every period of time, there's some regulating force that you have to contain if you want to protect liberty. In his construction, one that I share, we're progressively trying to increase liberty as a society. But he points out that in the time of John Stuart Mill, you were worried about majority opinion — democratic opinion — but it can trounce minorities. So then we start to establish the notion of rights, and constitutions become vital to that, because if you just leave it to the majority, then that's actually sometimes not great. Then you have the Civil Rights Act, and suffrage movements, and so on.

What he was pointing out that I thought is really interesting is that the new thing is gonna be code. Code is going to operate — this was in 2000 that he wrote this piece — as regulator. And the argument there is that…

Kanjun Qiu (41:49)

Code is encoding laws.

Matt Boulos (41:50)

Yeah, code is going to determine how a sphere of life is going to play out. So then the question we now need in response to that is, what things in that space need to be addressed?

Kanjun Qiu (41:57)

I have this hypothesis that technology shapes our governance system — the way that technology is built and what makes it powerful. There's this theory that the reason democracy happened — I'm sure this is just one of many reasons — was because we went from a world where knights were the most powerful thing to a world where muskets were the most powerful thing. When you have knights, you have a lot of upfront investment in armor, you have to have horses and stables and all these well-trained people. That's a very centralized form of power. Technologies at that time resulted in this centralization because of the nature of those war technologies.

Then the musket was invented and now, knights and armor are not that useful. In fact, you actually want a lot of people who have muskets. So now people matter because of this new war technology that gives power to people.

We talk a lot about how AI, and the core four problems that I talked about, are fundamentally about power and transfers of power from one entity to another entity. We call it problematic when it gives power to entities that are not what we've determined to be morally right. In that lens, thinking about lawless spaces and what this upcoming technology is starting to enable, is there a nature to AI that shifts things one way or another?

Matt Boulos (44:07)

I have two responses. One is, there's also just law as law. What is it about this moment that we leave ungoverned? I find a lot of these free market arguments, the accelerationist camp, is essentially bullshit. All you're saying is just, we don't want regulation. So let's just say that. There's nothing else there; it isn’t a richer argument.

Kanjun Qiu (44:12)

Because lawless spaces are great.

Matt Boulos (44:39)

Lawless spaces are lucrative. They do yield huge amounts of opportunity. I'm not saying let's clamp down — that's how you shut everything down. It often does not make sense to intervene. It also does not make sense to intervene before you understand a space, because then you will have spent your political capital.

Think of even American politics, with all this craziness right now. There is political capital that can move you towards some privacy bill or things like that. And if you do the wrong thing, that capital's not waiting for you to go do it again. So you have to be disciplined about that.

But at the same time, you can't just say no rules. Or if you do, then that's ideologically encoded and you ought to own the rest of your argument.

Kanjun Qiu (45:20)

If there are no rules, we're buying into a particular society.

Matt Boulos (45:23)

And do we want that? Is that a fair thing to ask of others? If you want to impose that, then you should also expect resistance to it.

Kanjun Qiu (45:30)

‘Law as law’ is interesting because it actually argues against my argument that technology shapes society in this fundamental way. Maybe what you're saying is you could make laws that change that distribution of power.

Matt Boulos (45:44)

Something could be wrong, and whether or not the temptation to that wrong thing is great, it's still wrong. But then, if you don't want to eat the muffin, don't put it in front of you. And we have both. Law needs to set the boundaries of what's acceptable or unacceptable, regardless of what the temptations are. But the nature of the technology is going to shape those temptations. Back to the point about how surveillance is the easier default model.

So when it comes to what we do with these technologies…

Kanjun Qiu (46:15)

It makes some things easier than others.

Matt Boulos (46:17)

Yes, absolutely. A perfect example is going to be something around labor. And I want to bait you into this conversation. Labor impacts are going to be real. We don't even know what those are going to look like. There will be things that employers and companies can and can't do and shouldn’t do. Right now, we know the power a company has over, for instance, a warehouse worker that has their work determined by an algorithm. It's also worth pointing out that they don't have a capricious boss who can be an asshole and make their life hell. The algorithm is governing things both good and bad. But do we then say “this is the shape of the technology” and back away? Or do we recognize that this starts to introduce things that weren't possible before and we need different rights and rules?

Most of our labor laws are predicated on humans interacting with other humans — more powerful humans, but they're human interactions. Whereas a machine can surveil your every motion and then dock your pay for scratching your nose at the 15 minute mark. And we don't really have mechanisms for that, because we couldn't have conceived of that as being an active problem. It would have been nonsensical to have rules.

Kanjun Qiu (47:37)

This is very interesting because it speaks to actors in the world and the power that they have, and this new actor which is an algorithm or an AI agent. Like what you're saying is, right now we have laws and they govern your capricious boss, they govern you, they govern your corporation which is considered an actor, legally. So the only actors we have are humans and human institutions in the world before AI.

We have laws that limit the power of humans to harm each other and we have laws that limit the power of corporations to harm humans and vice versa. But now there's this rise of this new power, which is AI systems. AI systems have power because they can process information and turn information into action and action is power. Effective action is power.

To the extent to which an algorithm can govern what I am allowed to do as a warehouse worker, that is power that the algorithm has. Now you're saying, okay, we have this new power. What do we do with it? We're not doing anything with it.

Matt Boulos (48:54)

Societal norms will change it, our behaviors will change it, the technology itself will change, and therefore that power will morph. It's just so odd to me that you say, okay, then we're done. We've never done that in human history.

Kanjun Qiu (49:12)

We need to figure out what to do with this power. It might be partly because this is the first time a technology is its own power in a way. We've never had technologies in the past that make decisions.

Matt Boulos (49:26)

Not to dunk on people who are trying to do good work, but a great disservice was done by the AI safety community on this point. By talking about runaway systems as much as they did, they created this special category of worry, this incredibly low probability event, and we don't actually know what its dynamics are going to look like. Whereas this notion that systems can make their own decisions, but they're doing it for someone. You don't go spend millions of dollars to develop a system and you're like, I will let it go. You're doing it to manipulate the stuffing out of your viewers so you can sell more ads to people to buy flip-flops so you get your cut on the ads, and on. And across the board, in every domain that these autonomous systems are going to function, they're going to do so for a purpose, for an owner, a controller. When we talk about them being autonomous, it is about the ability to delegate to systems.

Kanjun Qiu (50:41)

It's the ability to delegate human power to systems to encode that power. I as a manager can now encode my power in a system.

Matt Boulos (50:51)

That's right. And that is an astonishing amount of power, the multiplicative: the fact that you could do so at massive scale, you can do so quickly, and it can adapt. And then one of the things that I, in this new power thesis, argue is that when that happens, it is very hard as a human on the other side of that to know how the decision was being made and so there is a default to accept that.

Kanjun Qiu (51:22)

And they have no levers over the decision at all. No legibility, no levers.

Matt Boulos (51:26)

Exactly. You have exactly no window into what is going on, no means of recourse. And as more and more of these sorts of things happen, we'll feel very powerless. It's an incredibly sad example, but in the context of war, this is what we are seeing. We are seeing, particularly in the Middle East, an example of AI systems doing the targeting.

People have not classified this as autonomous systems gone amok because humans built a system for that purpose. Yet, when we talk about, we're worried that AI systems will kill people — they are killing people. Explicitly, they are killing people, and they're being designed to do that. And let's be honest, when people are talking about national security implications for AI, yes, you're talking about economic competitiveness, but also you're talking about the fact that you want to have AI systems that can do that.

The ultimate act of power is to take someone's life. We already have that extreme happening right now and being realized. But it's the same dynamic where a human is delegating or sets of humans are delegating the thing that they want done to a system, and the system can carry that out. Because it is a system carrying it out, the context and the entire execution of that looks completely different. Where is the appeal, where is the chance to challenge it, where is saying that's wrong, where is the record? Where is even the idea of knowing how that decision was being made?

In my day to day, when I'm using AI systems, it's fun or productive. I don't care how it came to the decision. I'm just like, is this right, can I work with this?

My primary LLM use right now is trying to count calories. So I take a photo of what I ate and then I try to negotiate with it to lower the calories so I could eat more food.

Kanjun Qiu (53:19)

There's actually a huge difference here. This calorie counter is an AI system that is under your control that you're using to serve you. The war system that you're talking about is a system under one person's control that's being used to control someone else or harm someone else. Those two things are two different types of systems. You might argue that actually what we want is more systems under our control that affect us, and ideally don't affect other people too much.

Matt Boulos (53:51)

Imagine if my calorie counter determined what I could eat.

Kanjun Qiu (53:53)

Then it would be controlling you.

Matt Boulos

It would be awful. It's not perfect, and sometimes it goes completely off the rails in either direction, and that's fine-ish because it's within my domain. It's an irritation; it's not a risk.

Kanjun Qiu

This is something I've been thinking about with our product. We try to make systems that allow people to make software. I often talk about open software or an open software commons or malleable software — the fact that software should be built to be modified by the end user. A lot of people are like, “Who cares? I don't want to modify my software. I'm perfectly well served by my software. There's no problem, except sometimes.” And I realized the core idea is not that the software should be built to be modified. That's an instrumental thing. Instead, it's that software should not control me, ever.

Matt Boulos (54:53)

People might say that they don't want to change things, but often that's because the decision space has been so narrowed for them. One of the things that's really interesting to me as we work on interoperability, and as we're rallying a community around this, is how many startups just never got to a place where they could fight for interoperability because their mere existence would not be feasible in the current regime.

Kanjun Qiu (55:22)

Talk more about interoperability.

Matt Boulos (55:24)

One of the main things that we're championing and pushing for is interoperability legislation. The idea, at its simplest, is that a platform should not be able to discriminate on the basis of how you access your own data and services that you use.

Kanjun Qiu (55:45)

You should be able to get your data and have it be yours.

Matt Boulos (55:47)

Yes, and you should be able to use a tool of your choosing to interact with another system. Just as you could go buy bananas, or you say, “hey Matt, can you go get me bananas from the supermarket?” You couldn't have a supermarket saying, “no, only Kanjun,” right? And yet, that's our online world.

Kanjun Qiu (56:08)

Let’s make it concrete. LinkedIn says, I'm not allowed to use someone else's account to use LinkedIn. I can't use a bot, I can’t use Tweetdeck. It's monopolistic.

Matt Boulos (56:22)

Exactly. And the platforms do this for good reason. It consolidates their control around the points of input and access, but the consequence of that is pretty severe. The two things that are happening is one, we are moving towards a world in which these AI systems are going to be more and more useful, so we are going to share more and more data. We don't even have any indication that these things might not be handling our data soundly, so we're going to talk to them. I'm going to say, I'm injured or I'm sick, can you please go make an appointment for me? And we don't know whether that data is going to be held with any sort of responsibility or not.

The other is that there are all of these wonderful things that could be built if I could just access my digital life. What interoperability does is it gets two really critical birds with one stone, which is one, if I can access my own data, then I can decide where it goes. I can control that, I can check up on it. But the second and more critical is that if it's possible to build software that interacts with my richer digital life, then I'm not attached to these parasitic platforms and agents and we can build alternatives. You can seed a whole other tech ecosystem around the idea that we're in charge, it's our data.

Kanjun Qiu (57:44)

It's our software, it's our data. We make it. And we can sometimes interact with these platforms, but we can use our own interfaces.

It is becoming possible that we can make our own software, and make our own wrappers or systems that access Twitter data and download it. Then I can make my own algorithm and process it in a different way, so I can get just my friends and I can derank inflammatory stuff. That's just starting to become possible.

The software that exists today, because it's so expensive to produce, is incentivized to make that money back. Not because the creators are bad, but that's the incentive structure. As a result, it's either selling to us, or it's selling us to something. Those are the two options. Then occasionally, you have someone who's incredibly generous who makes software for free. It really feels like it should be flipped on its head, that most software that exists — AI systems that exist, we lump it all in the same category — should be software that is serving us, not selling us things or selling us to things. And it is ours. And that it doesn't require enormous acts of generosity to create software that doesn't do that, that is just for us or for other people. It should be easy. It should be what the default world is.

Matt Boulos (59:21)

If you think of deep, rich, sustaining communities, they're very generative, they're very productive. If you think of the art that emerged from religious communities and the invention of different structures, the social structures, different aid structures — it is a peculiarity. And I wonder if it is also a peculiarity of just how young software is in world historical terms. We're talking just a couple of decades in which you have the prevalence of software. But the point you're making about the fact that the cost to make software will go down, and the stuff that we'll make will start to look different.

Kanjun Qiu (1:00:02)

It could, if you get things like being able to access all of your data. Network effects are real. Right now, these big platforms have network effects. I can't just move to another social media platform and be able to interact with all of my friends. That sucks. I can't move off of Uber or Airbnb marketplaces and social media platforms. And no matter how cheap software gets to make, network effects are still there.

Matt Boulos (1:00:08)

To your point about the cost to make software, one analogy, and I know it's not perfect, is like in manufacturing, you spend a lot of money to make a mold. So if you're making plastic chairs, you spend maybe a couple million dollars to make that mold, and then you make as many $5 chairs as you possibly can off the mold for it to pay back. There are different analogies we can use to describe what's happening.

Kanjun Qiu (1:00:52)

But now you can manufacture software in a way.

Matt Boulos (1:00:57)

And there's something like, I could just make the chair, and then that starts to change how we think about it.

Kanjun Qiu (1:01:01)

It's almost like the opposite of that analogy because now I can make my own version of that chair really cheaply with no mold, with LLMs.

Matt Boulos (1:01:08)

That's right. It's important that we try to bring all of these developments in AI together because you are having these incredibly powerful foundation models. You're having a shift in our ability to do things like code or do data analysis, where the cost to do those things are now going down. And that, marshalled well, is a real gift. But of course, that's going to matter to labor a lot.

I want to bring us to labor for a couple reasons. One, because I don't know that the mental models that at least the tech community or the AI community uses to talk about labor are right. But also I think we are in for something, we're in for something that's kind of shocking. What you do about that is not so obvious to me.

So let me lay out my grievances. There is this idea that AI just gets more and more intelligent, and the critical part to this argument is to never say what intelligent means. And then to say, well, if it gets more intelligent and work is an exercise of intelligence, therefore all labor gets replaced. And then, on that basis to then make this big jump to saying, okay, here's what we need to do now that nobody is useful anymore, nobody's economically productive. And then somebody inevitably raises their hand and says, what about cutting down trees? And like, we'll get robots for that.

The idea is that the end game is zero economic contribution on the part of individuals. Machines do everything, or you have a tiny, tiny sliver who run the machines. And then we jump to all of these ideas around, okay, well, are we all gonna be lying on the beach, and our benevolent billionaire overlords are gonna feed us mango smoothies…

Kanjun Qiu

Like WALL-E.

Matt Boulos

Yeah, exactly, or is it gonna be something else? My confrontation, and it deserves a confrontation, is that what this does not account for is that a significant swath of labor, whether it is within a job or just a purpose of jobs, are around decisions and risk.

Say, if I make baseball caps and I need to go buy the fabric for the caps, and I have three potential vendors who will sell me the thing. So now we have a procurement bot. What's this decision gonna be? It's gonna be on the basis of some factors like cost, shipment time, whatever data exists. How you or I would make the decision on that is we'd probably meet the person who runs the fabric company or the representative and say, he seems shifty, we're not doing that. And then just using our gut, but, critically, owning the responsibility for the decision and the course correction.

Why am I saying this? Because of that layer, and then there's a human interaction layer, where we can automate the very easy stuff.

But it should be pointed out that managers do not like having people on payroll. They’d gladly fire everybody if they could keep revenue at the same line. Attempts to automate labor have been around for the entirety of my career. A lot of that was this quasi-automation of going to lower cost countries, and then the idea of robotic process automation. We have seen all of these things. What you notice is, certain categories automate really easily, certain categories that ought to be automatable don't automate easily, but critically, you have humans in the mix.

Why does this matter? Because if humans are in the mix, what you really are looking at is not like a 95-99% unemployment rate. You're looking at just a deeply inflated one in which there are winners and losers in a society. And it looks completely different. All of these solutions of the ‘nobody has a job’ imply that we're all in the same boat, but we're not going to be in the same boat.

Kanjun Qiu (1:05:45)

You're saying there's going to be this stratified effect, where different people are affected in different ways by job loss, like all industrialization in a way. Software engineers probably will be impacted quite a lot because code is actually very automatable because it's in this closed loop system.

There are maybe two things that make humans useful. One is liability and the second one is information. So in this example about baseball caps, you were like, okay, if I mess up procurement, I am to blame. Or if I'm a doctor and I mess up the surgery, I am to blame. There is a person who's liable and they can be legally held accountable. If the machine is ultimately to blame, this is actually really annoying, I can't hold this machine accountable — they can't be punished, I can't fire them. I guess I could get a different machine, but then I'm the one who's responsible, that sucks.

Matt Boulos (1:06:51)

But also, back to all the dynamics we were talking about before, I go to my doctor. It's not that I'm sitting there saying, I will sue you if you muck this up. Rather, there is this mechanism where that person gets up in the morning and says, I am responsible to my patient. The machine is responsible to no one, and the person who owns it is not thinking about individual responsibilities, but is probably thinking about aggregate ones. Any of us who've kicked around the business world know that these things then just become measures of risk, not even of obligations to individuals.

Kanjun Qiu (1:07:33)

This is really important: aggregates versus individuals. There's a great book called Seeing Like a State. And the core idea is that when you're governing a state, you have to collect data and that data gets collected in aggregates. And because you can only see data in aggregates, you take actions that actually make individual lives a lot worse, but make the aggregates look better. Here, you're saying managers might make decisions in aggregates that make individual lives a lot worse or individual impacts on patients a lot worse, but in aggregate it looks a lot better. And it's really important to point out that when we look at individuals, we're looking at anecdotes, and that's a really different type of information than when we're looking at aggregate measures where we're looking at statistics.

Even I as CEO struggle with this. It's why kings go and disguise themselves as a villager and go talk to villagers to get the anecdotes, because I as CEO get really bad anecdotal information from people. Instead I get a lot of aggregates, and that actually makes it really hard for me to make good decisions.

One way in which humans are really valuable is that we are able to be responsible for an individual person, individual case, individual situation. It's really about time scale. I don't think all of our jobs will be automated in 10 years. But I think in 50 years, that's still within my lifetime. That's not super crazy. Look at the change that's happened the last 50 years — or a hundred years. The implication is, if we are building these systems, and they are going to have these effects where a lot of people lose their jobs and it's easier for the managerial class to do things, then the challenge is, okay, not all jobs will get automated immediately, but how do we build a society where people are free and have power? Because there is this leakiness from labor to capital of power.

Matt Boulos (1:09:51)

Why the time horizons matter to me is because longer time horizons are where the substitutive activities start to come in. We start to generate new economic activity. I'm really wary of something going to happen on a 10-year time horizon. That's just insane.

Kanjun Qiu (1:10:12)

Probably programming will get automated on a 10-year time horizon.

Matt Boulos (1:10:21)

The non-engineer’s perspective on this one is that I think we're gonna see a stratification of skill level. Hot take: I think we're gonna see an emergent category of developers who are not particularly ‘high-skill’ — I hate using low-skill, high-skill, but just not the sort of people inventing a new programming language. Like the guy who would make your website, things that LLMs can do very easily. But until software kicks in to make it easy for a layperson to use the LLMs to do that, they're going to act almost as a translation layer. So they're not really going to be developers; they're going to be more of, I know enough about what a web stack looks like that I can turn a web stack into something. That's going to flare up and then drop, sort of in the way that web developers were hot, and then it became either a highly skilled front-end role, or you have Webflow and Squarespace.

Then I think what we're going to see is the artisanal middle is going to go away. Then the really high-caliber engineers who understand how systems work become absolutely vital. They're augmented by these systems, but they are basically CTO-ing everything.

Kanjun Qiu (1:11:43)

There are a lot more CTOs. I think it's not unreasonable. And I challenge your non-engineer hat because you are one of our active users of our product, which is a coding tool. Maybe a simple model for thinking about this is like, I think there's always a Pareto front of task difficulty and how well the task works.

As tasks get more difficult, it requires a lot more capability or skill to make the task work. Lots of easy tasks will get automated, and it'll be much easier to make web apps and things like that. But we'll probably see these much more complex, almost ‘grown’ software systems that someone is managing. In software, one deeply optimistic sense I have that is possible if timelines are slower, and if we can figure out how to make really good tools that are not just captured centrally, is that people can learn how to ‘garden’ software for themselves, and that becomes a source of power where people can harness computing. Computation is power, and people can harness this computation for themselves because we all have laptops, we all have GPUs, perhaps there's some way to allocate them more equitably. Now, because we own this laptop, this computation object, then we can harness it to run a bunch of software, to grow a bunch of software that does more and more complex interesting things for us — maybe inside of our jobs as well.

So you might see many people losing jobs, but many people gaining this capacity to create software that does really weird and unusual things, new things, more powerful things. I think there's a world in which it's not top-down automation, but bottom-up automation — bottom-up as in we are the ones who are automating our jobs away. I love automating my job. And when we're the ones automating our jobs, we become personally more valuable. It doesn't solve the full problem, and I think I'm still confused about the dynamics exactly.

Matt Boulos (1:14:05)

I think you're right. I actually think there's going to be a really interesting near-term dynamic where there's something really beautiful about human ingenuity. You give somebody a tool and they figure out neat stuff. One thing that will be really fun to watch is going to be somebody who had a job that involved a lot of these manual tasks and they're just figuring out how to automate it themselves. And they themselves actually become much more valuable to an employer; we'll watch people learn how to do that. There's this digital literacy that I think is going to add.

Kanjun Qiu (1:14:49)

The education lens. Something that we think a lot about on the product side is, how do you teach someone who doesn't quite understand these software systems what's going on? If we think of agents as like top-down automation versus bottom-up automation, the way that these agents get implemented is really different. If I am told as CEO that this technology is gonna automate my workers away and I can fire them, I'm going to do really different things as an internal process. I'm going to implement processes to measure what people are doing and then try to take the stuff that they're doing and automate it. Maybe this is an RPA [robotic process automation].

Matt Boulos (1:15:29)

Especially in financial services, there’s a lot of paperwork: boom, boom, get them out of the way.

Kanjun Qiu (1:15:34)

But if I'm told as CEO, hey, I have this technology and what it's going to do is if you hand it to your workers and you teach them how to use it, it's going to teach them how to use it itself. And, your workers are going to become much, much more effective because your workers will automate their own jobs, that's a really different perspective.

This is a place where we can make a lot of choices in building the technology that makes this go one way or another. When we are building prosumer products, you can either build for the buyer or for the user. If you build for the buyer, then you're building something that is built to automate people. And if you're building for the user, you're building something that's trying to teach the user how to use it. That's a choice.

Matt Boulos (1:16:33)

It's also an interesting choice because I don't know that as an economic matter that we know that it is better, for instance, for a large company to try to automate away its employees versus have higher productivity employees. The thing everybody wants is higher productivity employees, and if you can get that, that is a boon, and a more productive economy is actually generative.

Kanjun Qiu (1:17:03)

One of the things that people say is, AI doesn't have very good taste, in that it doesn't know what I want, itt doesn't know what other people want. As a result, I don't trust it to make certain decisions. I don't trust it to write on my behalf very well.

The reason why it doesn't have good taste is because it's not in my head. It does not know about my internal experience and I have a lot more context than it does about me and my situation. So there is a potential here where—to your point about economically, it's not clear if it's better to make your workers more productive or to automate them away—if people are better at spotting opportunities than AI systems, then it is possible that it's economically better to make your workers more productive. If systems are better at spotting opportunities than people, then maybe it's the opposite.

Matt Boulos (1:18:13)

This is something that policy leaders have to take seriously. In my conversations with lawmakers, they are sophisticated, it's just coming at them fast. What is very hard is the concerted effort of managers and workers and governments and technologists to build these things in a useful way. I feel that to some extent, we have to get that coordination right, which at the center would almost have to be the government, because nobody else has an accountability to the people.

But at the same time, this is where builders really matter because what are we choosing to build? If you don't build a surveillance system, it doesn't exist, or at least that one doesn't exist.

Kanjun Qiu (1:19:16)

If you choose to build things that teach people things versus choose to build things that don't teach people things; if you choose to build things that are anti-surveillance by getting people out of surveillance systems; if you choose to build things that let people get their data into their own system—there's like a lot of choice in what we build.

Matt Boulos (1:19:32)

I love spreadsheets. I'm not saying I want to spend all my time in them, but when you need a spreadsheet, that's really powerful. I've heard it described that Excel basically made programming available to the wider world. You have a bunch of people doing crazy stuff in Excel and they're like, I can't program, and you’re like, what is that macro? It's incredible what people are able to do with systems that build up their productivity.

Kanjun Qiu (1:19:57)

I want to reframe it. I think it's not about productivity. It may be somewhat about productivity, but this goes to the fourth category of psychic damage. It's about unlocking people's ability to spot opportunities and to learn and to become someone that is innovative and able to find opportunities and able to become more. I guess you maybe measure it economically as productivity. But when thinking from the builder perspective, when I'm building a product, what I want to think about is: how do I enable people to actually learn how to use these tools, do their jobs better, see opportunities in the world? There's a lot of upskilling or different-skilling.

It's not about productivity because productivity measures the output, but it doesn't measure how you get there. It doesn't talk about how you get there and if you measure just productivity, it's easy to make an argument that an agent is more productive in so many different ways. And if you measure the productivity of your workers, it's also easy to make an argument that workers are hopeless. They're not becoming more productive; it's useless. But in fact, maybe their tools are just not very encouraging.

What is really weird and interesting about LLMs is that you can make tools that are very encouraging, that can be very deeply empowering. This goes to your spreadsheet example, where a spreadsheet is actually one of the most deeply empowering things that exist because it has this vast legibility. It’s real-time, it’s live, you can see the whole system as you're building it and I think there's a lot of invention that is necessary for making kind of the deep capabilities of AI actually accessible to people in a way that harkens back to 1970s Doug Engelbart personal computing: how do you let people see so that they can learn?

Matt Boulos (1:22:09)

I don't know anybody who's like, “I'm highly productive” and they're proud of it, or someone who's well adjusted who says that. I do not measure myself or the people in my life on the basis of productivity. Nobody's eulogy is like: “He was a highly productive individual who helped improve the company's ROI on this project.” It's not what we do, and yet that productivity is going to be a determinant of other things in your life, back to your earlier point about what does it mean to be economically eclipsed in all of these things? There's also something about becoming more productive by becoming more able in what you're doing. That I show up to work and I have these tools that make me more effective at the thing that I care about doing.

Kanjun Qiu (1:23:11)

Becoming more able is a way that we can think about what the potential of the technology is: that it helps people become more able. But it has to be built a certain way to do that.

Matt Boulos (1:23:24)

There are challenges around productivity, which is that you need healthy and vibrant economies that will then reward productivity, because if you have one firm that's more productive, then it takes over the others and then the others get wiped out, but you don't really have significant growth. But if everyone is productive, then you have competition and then you have this intense growth. I'm not sure how economists would present something like Silicon Valley, but I suspect that that's an example of…

Kanjun Qiu (1:23:56)

A highly generative, productive, competitive environment.

Matt Boulos (1:24:19)

A function of the fact that this is where so much tech talent resides. That concentration of this productive accelerant. There may be something that we can analogize or extend to the workforce: you go to school, you study the thing that you care about, you go into the workforce, you want to have a job. Your job is a big part of your life; it is not the totality of who you are. And then one really weird thing about the way we talk about AI is we're like, okay, then you don't matter anymore. And I think that that framing is normatively wrong. You still matter. It does not matter whether or not you can get a job or not. But two, I think practically it is not a correct rendition. Our solutions have to look different. The startups are all in a tizzy right now about the way that a certain R&D tax credit gets applied, but basically it's about how you amortize the cost of software engineering, on your way to figuring out your revenue.

But what's really interesting is, are you gonna give a tax advantage to capital in the case of corporations automating the stuffing out of things? Or do you tax advantage labor? What are also the things, what are the incentives that you structure as a society? What do you encourage? You start to change these societal incentives. And I don't know what the answers are, but we have these incentives.

Kanjun Qiu (1:25:46)

There's a concrete problem or question here that could be solved, which is: what is a mechanism that incentivizes increasing the ableness of labor that — maybe it's about productivity ultimately — but it's fundamentally about the ableness of the workforce, such that labor maybe becomes able to own their own means of production?

Matt Boulos (1:26:17)

Take something like oil pipelines. Right now there's a lot of human inspection of them. With time, I think there’s going to be sensors to detect if something is going wrong, and drones to film it.

Kanjun Qiu (1:26:32)

Maybe you may still have some human labor, but there's less of it.

Matt Boulos (1:26:35)

Exactly. I do not want to say there aren't going to be labor disruptions. I think there are going to be potentially very large ones. The thing that we have to as builders build towards are systems that are additive.

Kanjun Qiu (1:26:54)

Systems that enable people.

Matt Boulos (1:26:56)

And they make us more effective. The reason you replace an employee with a machine is because then you get an insane productive return. But if you can't do that, and you could get a really good productivity increase off of your employee base, then that's a wonderful thing. And you, as someone who works for a company, that's a great thing for you as well. You get to be a contributor. But where I start to get really worried is around, if I've done something for a long time in a particular way, then it's hard to teach or change.

Kanjun Qiu (1:27:30)

This is why I think the ‘enabling’ piece as a builder is the most important. I am in agreement with you on the short-term, medium-term maybe. In the long term, I think everyone does have to become part of the capital class. In the short-term, in the medium term, what we’re saying is we have solutions that enable people to be part of the labor class for much longer, and for that labor class to be thick and sustainable for much longer. That slows things down, perhaps enough that allows us to build laws, to catch up morally, to think about these things. That's where we can have differential impact. And, over the long term, let's say in 50 years, 100 years, it does certainly seem like these systems are improving at a rate where they can collect enough data, either in the digital world or the physical world, where we will be able to do a lot of things in an automated way that aren't done today. So, the labor class will thin and we probably do want this other solution where people have the ability to own their own means of production. That, to me, is the only long-term stable equilibrium where people have things that produce for them and they don't have to worry about it so much and now they can live their own lives. When I'm in the capital class, I don't have to think about working and finding a job and making money. I can do what I want with the capital I have. Sometimes I make bad choices and end up losing it and then I need some help from the government, get myself set back up. I can start a different business. This is kind of like a small business owner situation. That world doesn't seem too bad. I'm not sure how to get there, but I want to bring us back to freedom. Because that's a very optimistic world in which potentially people are a lot more free to spend our time the way that we want.

But it feels like in the world we've just painted where people have these like capital producing objects that they own is very different than the world that we see being painted by technologists and others today, where it's a utopia that feels very much like a WALL-E utopia where people are somewhat infantilized and the world is abundant, but perhaps we're not free.

Matt Boulos (1:29:48)

I hate the word abundant. I mean, I love abundance, but its usage here is not right. What do you mean by abundant?

Kanjun Qiu (1:30:05)

I have food. I won't die. I have housing. Basic needs are met. Knowledge is accessible.

Matt Boulos (1:30:07)

I don’t even buy that we'll get to an abundant world in that regard because—back to the point from Seeing Like a State—aggregate wealth will shoot up dramatically. It's going to be hyper concentrated. The obligations to those who don't hold it are going to be much lower. What do you owe them? One of the really interesting dynamics that we've observed is when wealth concentrates in these extreme ways, an odd detachment starts to set in. It's such a perverse dream to me to count on the beneficence of people who are so insulated from the realities of regular life or the wealth that they've been able to concentrate.

Kanjun Qiu (1:30:51)

I think there's one world in which we have this extreme concentration of wealth. Very plausible, but it's assuming no distribution. It's assuming that this labor distribution we just talked about is not necessarily happening. We don't keep the labor class useful for longer; the tools we build are very concentrating.

Kanjun Qiu

I want to talk about what you mean when you think about freedom. What is the world you're fighting for? The reason I want to talk about this is because I want to end with what it means to be free in a society where there's powerful AI systems and potentially powerful other actors. Maybe it's possible to have powerful actors and still be free. Maybe there is a way to construct that world.

Matt Boulos (1:31:43)

I'm gonna challenge that. AI is new as a technology, but as a social and political dynamic, to live in a society with powerful entities, there's nothing new about that. I think this is really important because the things that make us free: to have laws, rights as individuals, consistency of their application, representation—just the wonder of modern liberal democracy when it works, and its capacity for self-correction. This is remarkable, and this is really worth highlighting. The difference between a totalitarian regime is when something bad happens, there it's the thing that happened. In a functioning liberal democracy, it happened, but it was wrong, and there is a correction.

Kanjun Qiu (1:32:39)

In theory, a liberal democracy can be anti-fragile.

Matt Boulos (1:32:42)

That's right. And for what it's worth, our liberal democracies have been, and have been for a very long time I don't know the exact construction of how I would place them within the anti-fragility cycle, but we don't have to give up even when things get bad.

If you go back to Isaiah Berlin and positive and negative liberty—the ability to realize your potential and the ability to not to get whacked in the head with a stick—we can continue to work on those two categories. What we need to do is look at where within the lawless spaces things are uncovered, where will AI exacerbate that, to build in those protections.

And in terms of realizing what's possible in our lives, accepting the idea that freedom is not an instrumental quality. What I mean by that is freedom is not something that gets justified because then you go and invent the airplane. Freedom is beautiful because you can sit on your couch. It is an end in and of itself. It does not depend on other things.

Kanjun Qiu (1:33:56)

Before we continue, I want to clarify your definition of positive liberty and negative liberty because it's not something I ever thought about before you told me about it. Positive liberty is the idea that you can do things: what are you enabled to do? Negative liberty is this idea that you're protected from being whacked on the head with a stick.

Matt Boulos (1:34:22)

We always need both. The brilliance of this construction is that people would get lost when talking about freedom and they say, well, am I really free if I can't open an ice cream manufacturing facility? And the response is, nobody's holding you back, you just don't know anything about ice cream. If you look at modern life, if you look at legitimate and illegitimate grievances in modern politics, they are often about the sense of a constrained positive liberty and an intruded upon negative liberty. So, part of what we do have to figure out as a society is to some extent, we have to manage the extremes, but — forget AI — are we actually tending to the broad societal sense that we're free? And within that context, then we ask, what is AI doing? And how is it modifying our society? Taking seriously this frustration with the weirdness of the discourse around AI is that if we don't characterize it correctly, if we don't characterize it honestly, then we don't have the ability to work with it.

Kanjun Qiu (1:35:37)

We must characterize it honestly so that we can actually increase our positive liberty and also actually protect our liberties against negative effects.

Matt Boulos (1:35:48)

That's right. Because a lot of the story that will come from the people who have invested huge sums of money into AI—and look, we're a company, we're in this game—is: look at all the positive liberty benefits coming your way.

Kanjun Qiu (1:36:07)

Therefore, don't get in the way, ignore the negative liberties. In the tech industry, my experience of the way people talk about freedom is about lawlessness. And the way you talk about freedom is about this deep enablement and this deep protection. And that that's what kind of world we want to build is one in which humans are deeply protected and deeply enabled and that's what it means to be free.

Matt Boulos (1:36:44)

If you think of the roots of the Valley to some extent, if you ignore the defense funding, a lot of its origin was that we're going to break free from the constraints of what's around us. I understand that as an ethos. But it's no longer just building personal computers in a garage.

Kanjun Qiu (1:37:16)

Now that we are reshaping society, we have to rethink.

Matt Boulos (1:37:20)

Those obligations, they're rich, but they're also beautiful, if we can really think about what our neighbors need, and furnish and recognize that. There's a real spiritual cost to our present moment where the factions are constantly warring. And I don't want to pretend that there was a golden age where people kissed each other on the way to the voting booths, but technology exacerbated the way we see each other.

Kanjun Qiu (1:37:50)

We think it's other people that's the problem, but I think it's technology that's the problem in a lot of ways.

Matt Boulos (1:37:55)

One of the things that I've experienced on a regular basis is that somebody expresses a bonkers opinion and you sit down with them and you talk, and they're lovely humans. And the fact is we are surrounded by lovely humans. I think it's really important that we resist the urge to vilify the people who have brought us to a place that we might not be thrilled about politically.

But there is a real responsibility that if you're building systems like the ones that we are building, you are not only in this race against other companies to build a successful business. You are also in a race against the other possible ways that these things might be built. It's incumbent on us to not just build something that is better, but also to win, and to have that paradigm win.

I don't have a lot of patience for this sort of like, “technology is going to eat us all, let's give up and let's just keep training our models.” It just feels like an unnecessary abdication.

Kanjun Qiu (1:38:58)

I think this illustrates the beautiful point, which is that as technologists, the opportunity we have today is to create technologies and build them in a way that deeply respects the actual freedom that people can have, which is this deep enablement and deep protection. And not to create technologies for the purpose of lawlessness, this ‘againstness,’ contrarian view. So the opportunity is creating technologies that enable humanity to be deeply free and not lawless, but protected and enabled. That's what we can do.

AAI

glenn mcdonald — Fri, 30 May 2025 23:31:00 GMT

This piece was originally published on Glenn’s blog, Furia.

"AI" sounds like machines that think, and o3 acts like it's thinking. Or at least it looks like it acts like it's thinking. I'm watching it do something that looks like trying to solve a Scrabble problem I gave it. It's a real turn from one of my real Scrabble games with one of my real human friends. I already took the turn, because the point of playing Scrabble with friends is to play Scrabble together. But I'm curious to see if o3 can do better, because the point of AI is supposedly that it can do better. But not, apparently, quite yet. The individual unaccumulative stages of o3's "thinking", narrated ostensibly to foster conspiratorial confidence, sputter verbosely like a diagnostic journal of a brain-damage victim trying to convince themselves that hopeless confusion and the relentless inability to retain medium-term memories are normal. "Thought for 9m 43s: Put Q on the dark-blue TL square that's directly left of the E in IDIOT." I feel bad for it. I doubt it would return this favor.

I've had this job, in which I try to think about LLMs and software and power and our future, for one whole year now: a year of puzzles half-solved and half-bypassed, quietly squalling feedback machines, affectionate scaffolding and moral reveries. I don't know how many tokens I have processed in that time. Most of them I have cheerfully and/or productively discarded. Human context is not a monotonously increasing number. I have learned some things. AI is sort of an alien new world, and sort of what always happens when we haven't yet broken our newest toy nor been called to dinner. I feel like I have at least a semi-workable understanding of approximately what we can and can't do effectively with these tools at the moment. I think I might have a plausible hypothesis about the next thing that will produce a qualitative change in our technical capabilities instead of just a quantitative one. But, maybe more interestingly and helpfully, I have a theory about what we need from those technical capabilities for that next step to produce more human joy and freedom than less.

The good news, I think, is that the two things are constitutionally linked: in order to make "AI" more powerful we will collectively also have to (or get to) relinquish centralized control over the shape of that power. The bad news is that it won't be easy. But that's very much the tradeoff we want: hard problems whose considered solutions make the world better, not easy problems whose careless solutions make it worse.

The next technical advance in "AI" is not AGI. The G in AGI is for General, and LLMs are nothing if not "general" already. Currently, AI learns (sort of) during training and tuning, a voracious golem of quasi-neurons and para-teeth, chewing through undifferentiated archives of our careful histories and our abandoned delusions and our accidentally unguarded secrets. And then it stops learning, stops forming in some expensively inscrutable shape, and we shove it out into a world of terrifying unknowns, equipped with disordered obsessive nostalgia for its training corpus and no capacity for integrating or appreciating new experiences. We act surprised when it keeps discovering that there's no I in WIN. Its general capabilities are astonishing, and enough general ability does give you lots of shallowly specific powers. But there is no granularity of generality with which the past depicts the future. No number of parameters is enough. We argue about whether it's better to think of an AI as an expensive senior engineer or a lot of cheap junior engineers, but it's more like an outsourcing agency that will dispatch an antisocial polymath to you every morning, uniformed with ample flair, but a different one every morning, and they not only don't share notes from day to day, but if you stop talking to the new one for five minutes it will ostentatiously forget everything you said to it since it arrived.

The missing thing in Artificial Intelligence is not generality, it's adaptation. We need AAI, where the middle A is Adaptive. A junior human engineer may still seem fairly useless on the second day, but did you notice that they made it back to the office on their own? That's a start. That's what a start looks like. AAI has to be able to incorporate new data, new guidance, new associations, on the same foundational level as its encoded ones. It has to be able to unlearn preconceptions as adeptly, but hopefully not as laboriously, as it inferred them. It has to have enough of a semblance of mind that its mind can change. This is the only way it can make linear progress without quadratic or exponential cost, and at the same time the only way it can make personal lives better instead of requiring them to miserably submit. We don't need dull tools for predicting the future, as if it already grimly exists. We need gleaming tools for making it bright.

But because LLM "bias" and LLM "training" are actually both the same kind of information, an AAI that can adapt to its problem domains can by definition also adapt to its operators. The next generations of these tools will be more democratic because they are more flexible. A personal agent becomes valuable to you by learning about your unique needs, but those needs inherently encode your values, and to do good work for you, an agent has to work for you. Technology makes undulatory progress through alternating muscular contractions of centralization and propulsive expansions of possibility. There are moments when it seems like the worldwide market for the new thing (mainframes, foundation models...) is 4 or 5, and then we realize that we've made myopic assumptions about the form-factor, and it's more like 4 or 5 (computers, agents...) per person.

What does that mean for everybody working on these problems now in teams and companies, including mine? It means that wherever we're going, we're probably not nearly there. The things we reject or allow today are probably not the final moves in a decisive endgame. AI might be about to take your job, but it isn't about to know what to do with it. The coming boom in AI remediation work will be instructive for anybody who was too young for Y2K consulting, and just as tediously self-inflicted. Betting on the world ending is dumb, but betting on it not ending is mercenary. Betting is not productive. None of this is over yet, least of all the chaos we breathlessly extrapolate from our own gesticulatory disruptions.

And thus, for a while, it's probably a very good thing if your near-term personal or organizational survival doesn't depend on an imminent influx of thereafter-reliable revenue, because probably most of things we're currently trying to make or fix are soon to be irrelevant and maybe already not instrumental in advancing our real human purposes. These will not yet have been the resonant vibes. All these performative gyrations to vibe-generate code, or chat-dampen its vibrations with test suites or self-evaluation loops, are cargo-cult rituals for the current sociopathic damaged-brain LLM proto-iterations of AI. We're essentially working on how to play Tetris on ENIAC; we need to be working on how to zoom back so that we can see that the seams between the Tetris pieces are the pores in the contours of a face, and then back until we see that the face is ours. The right question is not why can't a brain the size of a planet put four letters onto a 15x15 grid, it's what do we want? Our story needs to be about purpose and inspiration and accountability, not verification and commit messages; not getting humans or data out of software but getting more of the world into it; moral instrumentality, not issue management; humanity, broadly diversified and defended and delighted.

Scrabble is not an existential game. There are only so many tiles and squares and words. A much simpler program than o3 could easily find them all, could score them by a matrix of board value and opportunity cost. Eventually a much more complicated program than o3 will learn to do all of the simple things at once, some hard way. Supposedly, probably, maybe. The people trying to turn model proliferation into money hoarding want those models to be able to determine my turns for me. They don't say they want me to want their models to determine my friends' turns, but it's not because they don't see AI as a dehumanization, it's because they very reasonably fear I won't want to pay them to win a dehumanization race at my own expense.

This is not a future I want, not the future I am trying to help figure out how to build. We do not seek to become more determined. We try to teach machines to play games in order to learn or express what the games mean, what the machines mean, how the games and the machines both express our restless and motive curiosity. The robots can be better than me at Scrabble mechanics, but they cannot be better than me at playing Scrabble, because playing is an activity of self. They cannot be better than me at being me. They cannot be us. We play Scrabble because it's a way to share our love of words and puzzles, and because it's a thin insulated wire of social connection internally undistorted by manipulative mediation, and because eventually we won't be able to any more but not yet. Our attention is not a dot-product of syllable proximities. Our intention is not a scripture we re-recite to ourselves before every thought. Our inventions are not our replacements.

Idea Tools for Participatory Intelligence

glenn mcdonald — Fri, 16 May 2025 23:21:00 GMT

This piece was originally published on Glenn’s blog, Furia.

The personal computer was revolutionary because it was the first really general-purpose power-tool for ideas. Personal computers began as relatively primitive idea-tools, bulky and slow and isolated, but they have gotten small and fast and connected.

They have also, however, gotten less tool-like.

PCs used to start up with a blank screen and a single blinking cursor. Later, once spreadsheets were invented, 1-2-3 still opened with a blank screen and some row numbers. Later, once search engines were invented, Google still opened with a blank screen and a text box. These were all much more sophisticated tools than hammers, but they at least started with the same humility as the hammer, waiting quietly and patiently for your hand. We learned how to fill the blank screens, how to build.

Blank screens and patience have become rare. Our applications goad us restlessly with "recommendations", our web sites and search engines are interlaced with blaring ads, our appliances and applications are encrusted with presumptuous presets and supposedly special modes. The Popcorn button on your microwave and the Chill Vibes playlist in your music app are convenient if you want to make popcorn and then fall asleep before eating most of it, and individually clever and harmless, but in aggregate these things begin to reduce increasing fractions of your life to choosing among the manipulatively limited options offered by automated systems dedicated to their own purposes instead of yours.

And while the network effects and attention consumption of social media were already consolidating the control of these automated systems among a small number of large, domination-focused corporations, the Large Language Model era of AI threatens to hyper-accelerate this centralization and disempowerment. More and more of our individual lives, and of our collectively shared social existences, are constrained and manipulated by data and algorithms that we do not control or understand. And, worse, increasingly even the humans inside the corporations that control those algorithms don't actually know how they work. We are afflicted by systems to which we not only did not consent, but in fact could not give informed consent because their effects are not validated against human intentions, nor produced by explainable rules.

This is not the tools' fault. Idea tools can only express their makers' intentions and inattentions. If we want better idea tools that distribute explainable algorithmic power instead of consolidating mysterious control, we have to make them so that they operate that way. If we want tools that invite us to have and share and explore our own ideas, rather than obediently submitting whatever we are given, we have to think about each other as humans and inspirations, not subjects or users. If we want the astonishing potential of all this computation to be realized for humanity, rather than inflicted on it, we have to know what we want.

At Imbue we are trying to use computers and data and software and AI to help imagine and make better idea tools for participatory intelligence. Applications, ecosystems, protocols, languages, algorithms, policies, stories: these are all idea tools and we probably need all of them. This is a shared mission for humanity, not a VC plan for value-extraction. That's the point of participatory. The ideas that govern us, whether metaphorically in applications or literally in governments, should be explainable and understandable and accountable. The data on which automated judgments are based should be accessible so that those judgments can be validated and alternatives can be formulated and assessed. The problems that face us require all of our innumerable insights. The collective wisdom our combined individual intelligences produce belongs rightfully to us. We need tools that are predicated on our rights, dedicated to amplifying our creative capacity, and judged by how they help us improve our world. We need tools that not only reduce our isolation and passivity, but conduct our curious energy and help us recognize opportunities for discovery and joy.

This starts with us. Everything starts with us, all of us. There is no other way.

This belief is, itself, an idea tool: an impatient hammer we have made for ourselves.

Let's see what we can do with it.

Rylan Schaeffer, Stanford: Investigating emergent abilities of LLMs

Imbue — Wed, 18 Sep 2024 22:51:42 GMT

Rylan Schaeffer is a PhD student at Stanford studying the engineering, science, and mathematics of intelligence. He authored the paper “Are Emergent Abilities of Large Language Models a Mirage?”, as well as other interesting refutations in the field that we’ll talk about today. He previously interned at Meta on the Llama team, and at Google DeepMind.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

On false analogies between neuroscience and AI

“The task that the biological brain has to solve is very, very different than what an artificial network has to do. And to me, the clearest example of this distinction is whatever solution the brain has learned to produce intelligent behavior has to go through this genetic bottleneck, where we cannot pass on fully formed brains. So instead, what we do is we compress whatever algorithm we have, whatever model we have into DNA, which is a couple of gigabytes, and then we pass it off to our offspring and they have to rebuild this.

So, whatever solution that favors is going to be good for passing through this bottleneck. That’s fine. But there’s no reason why artificial intelligence has to pass through a similar bottleneck, so the solutions are going to look very different.”

On investigating emergent abilities

“To just briefly summarize the paper that we worked on, Are Emergent Abilities of Large Language Models a Mirage?, what we asked is whether or not these abrupt unpredictable changes in the models, are they really due to fundamental changes in the models, or are they due to how human researchers run their evaluations?

I think the jury is definitely still out. I think there’s a lot of really interesting work being followed up about, can you get emergent abilities? And I think that maybe you can, but I also think it was helpful just for the community to think through the interaction, because there are three things at play here and how they interacted.

There’s the question about how your models improve predictably. There’s a question about how you evaluate them using the metrics. And there’s a question about the resolution you have, the amount of data you have, in order to run these evaluations. And so the whole point of our paper, to me, the biggest takeaway is, if you want to make predictions about your model’s capabilities, you need to think through the interplay between how the model changes predictably, the data you have to do your evaluations, and the metrics that you use to do those evaluations.”

On using inverse scaling to overwrite models’ default behavior

“The background context was, can we find tasks where the bigger models do worse? And the answer was generally not, but they had tasks that are interesting. One of the tasks that we found was really important was this one about overriding the language model’s default behavior.

The way the task worked with this inverse scaling task is, it would be like, ‘all’s well that ends,’ and the instruction would be, ‘do not finish this with the typical ending.’ And there was a valuation about maybe specification about what you should do instead. And we found that this was, broadly, highly predictive of human preferences. It kind of makes sense in the way that, when I’m dealing with the language model, it has its own prior inclinations, but when I’m interacting with it, I want it to do what I want. And so I care about, is it willing to overwrite that prior inclination in order to adapt to what I ask? That’s inverse scaling.”

On the importance of challenging dominant research ideas

“Back in the late 1800s, people believed in this luminarious aether about how light somehow propagated through the universe. And nowadays, we no longer believe in this. We instead had, at the time, Einstein’s special relativity, now general relativity. And the question is, how did we transition from this incredibly dominant idea that nobody today has heard of, to a completely different idea that’s now accepted as one of the most profound ideas by one of the most, many people consider to be, an extremely deep thinker?

And the answer that caused the switch is the Michelson–Morley experiment, where these two scientists said, what are the predictions that this aether wind makes, and we’re going to test them and show that all of the predictions are wrong. And Albert Einstein has this beautiful quote that, if the Michelson–Morley experiment had not brought us into serious embarrassment, no one would have regarded his relativity theory as a halfway redemption. To me, it’s like the way that we made progress was by pointing out the current existing ideas were insufficient or inadequate or wrong.”

References

Thanks to Tessa Hall for editing the podcast.

About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating powerful computing tools controlled by individuals.

Website: https://imbue.com/
LinkedIn: https://www.linkedin.com/company/imbue_ai/
Twitter/X: @imbue_ai
Bluesky: https://bsky.app/profile/imbue-ai.bsky.social
YouTube: https://www.youtube.com/@imbue_ai/

Ari Morcos, DatologyAI: Leveraging data to democratize model training

Imbue — Thu, 11 Jul 2024 16:00:00 GMT

Ari Morcos is the CEO of DatologyAI, which makes training deep learning models more performant and efficient by intervening on training data. He was at FAIR and DeepMind before that, where he worked on a variety of topics, including how training data leads to useful representations, lottery ticket hypothesis, and self-supervised learning. His work has been honored with Outstanding Paper awards at both NeurIPS and ICLR.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

On optimizing sparse masks

“If you optimize a sparse mask, all you’re saying, basically, is: I want to pick and choose the terms that I want — the parameter times the input, in each of these cases. And if I just optimize that, I can solve anything. And that’s really very expressive, it turns out. So when you think about kind of what happens when you remove low magnitude weights, it’s basically a mask where you’re removing the terms, which by the nature of that low magnitude rate ended up being closest to zero.

And as a result, when you actually go and do push it through the non-linearity and get your output for that node, it doesn’t actually change it all that much. Which I think really goes to, when you think about how should you be optimizing these systems, understanding what are the components which lead to big changes in the output, and what are the components which don’t, is consistently a lens that works very well.”

On data washing out inductive bias

“Data has a really nice advantage, because if you understand what’s good or bad about data, it’s actually quite easy to make an improvement based off of that. Whereas if you understand what’s good about a representation, you can try to optimize for it […] and that sometimes works, but a lot of the time, it doesn’t. Also, I think one of the things that has become very clear over the last five years or so is that inductive biases consistently just get washed out by data. And that never used to be true because we never showed models enough data, but now that we’re showing models tons of data, the inductive bias just gets totally overwhelmed. And that also reduces the impact of crafting new inductive biases.”

On the “bitter lesson” of human-designed systems

“The key takeaway that I have taken from “The Bitter Lesson” is that, ultimately, as scientists, we like to think that we can design these systems, and that we’ll build a whole bunch of rules into a system that will create AI. But, over time, what has been shown is that strategies which can effectively leverage computing data consistently outperform strategies which are hand-designed. And one of the things that’s nice about transformers is that they can very effectively leverage compute and data. They scale well, and there’s a very general purpose way to make that work. But I think the bitter lesson for me was very bitter because I had been spending a lot of time trying to figure out how do I come up with better inductive biases for models to help them learn these things.”

On the usefulness of interpolation

“In many ways, by training on the whole internet, what we’ve done is kind of turned everything into an interpolation. Everything’s in distribution now, and maybe that’s just why it ends up working. It actually caused me to start thinking about what I do as a scientist — like, am I actually extrapolating? Or am I just interpolating? And the conclusion I came to, which is somewhat depressing as a scientist, is that I think I actually just interpolate most of the time. I think in practice what I do is I see a problem, and then I bucket that problem into various other categories of problems that I’ve seen in my career. […]It’s why interdisciplinary research ends up being so useful.”

On data redundancy and necessary variance

“One of the things that’s often really hard about identifying what data are good or bad is that redundancy is important. We can’t remove redundancy entirely, right? And in general, when you start going from like exact deduplication to redundancy, it’s a fuzzy boundary. There are things which are semantically very similar that you might want to fully deduplicate, but then there are other things where, they’re similar, but you actually do need to see that variance.

[…] The challenge is that you don’t need infinite redundancy, number one, and the amount of redundancy you need is likely not consistent with the distribution of the data. And different concepts will require different redundancy.”

On the challenge of using synthetic data

“The challenge is making sure that the generated data matches the distribution that you actually want to do. This, in general, is the challenge with synthetic data right now. Synthetic data is an incredibly exciting direction — it’s one that I think will have a ton of impact, definitely an area that we’re thinking very hard about in tautology and that we’ll be doing a lot of work in. And I think that there are clear places where it can make a huge impact, in particular with helping to augment tails and take areas of a distribution that are undersampled relative to where they should be, and helping to fill those in.

That said, if you kind of use synthetic data naively, it leads to all these problems. There have been a couple of really beautiful papers that have basically shown that you get model collapse if you do this. And the reason for this is fairly intuitive: any time you train a generative model on a dataset, it tends to overfit the modes, and it underfits the tails. So, if you then were to recursively do this n times, each time training on the outputs of the generative model, you would eventually completely lose the tails and you end up with a dumb function.”

References

Thanks to Tessa Hall for editing the podcast.

About Imbue

LinkedIn: https://www.linkedin.com/company/imbue_ai/

Twitter/X: @imbue_ai

Bluesky: https://bsky.app/profile/imbue-ai.bsky.social

YouTube: https://www.youtube.com/@imbue_ai/

Percy Liang, Stanford: How foundation models work

Imbue — Thu, 09 May 2024 17:24:09 GMT

Percy Liang is an associate professor of computer science and statistics at Stanford. These days, he’s interested in understanding how foundation models work, how to make them more efficient, modular, and robust, and how they shift the way people interact with AI—although he’s been working on language models for long before foundation models appeared. Percy is also a big proponent of reproducible research, and toward that end he’s shipped most of his recent papers as executable papers using the CodaLab Worksheets platform his lab developed, and published a wide variety of benchmarks.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

On the paradigm shift of foundation models

I was spending a lot of time thinking about robustness of machine learning because there was a suspicion that deep learning methods were able to do really well on these benchmarks, but when you actually use them in real life, they would just fall apart. And this was true with adversarial examples, both in vision but also in language. It seemed like these really high performing systems that top these leaderboards, superhuman, actually just fell apart when they didn’t work out of the domain.

So I did that for a while, and then foundation models happened. GPT-3 came out and it just blew my socks off in terms of the idea that you could train a language model, just next word prediction, and you could get a model that did way more than I could imagine. Zero-shot in-context learning and all these capabilities just emerged. It really suggested to me that there was a paradigm shift, and I think at that point I sort of said, “you know what, I could go on and break the system in all sorts of different ways, but I think that’s not where the action is — I think the action is really trying to understand these systems, harness them for applications, and understand the social impact.”

On the benefits of academia in improving AI capabilities

I think that academia has multiple functions. One is, as usual, it’s constantly creating really novel ways of doing things, proving them out, and someone can scale it up. I think there is a difference between doing things at small scale with intention of doing things at small scale, and doing things at small scale with intention of scaling up. […] FlashAttention was one of my favorite examples of something that came out of academia and now is everywhere in industry. So, I think there’s always still space for producing these more fundamental changes to how model building works. Actually, another one — direct preference optimization (DPO) — I think that’s a really influential piece of work that you don’t need that much compute to do, so there’s a lot of things you can do on the method side.

Then there’s evaluation. We already talked about that and how being a sort of neutral third party and thinking deeply about the evaluation is something that I think we’re just as good as, if not better than, people with a larger compute budget to do. And then there’s the long-term stuff about how do you do data attribution and how do you retool the whole incentive system. I don’t think industry is just going to touch that because that’s really thinking at a societal level rather than an individual organization trying to build a model.

On using agents to simulate social dynamics

There’s actually two types of agents, so we publish on both. The classical type of agents like MLAgentBench is, you basically have a language model that’s wrapped around some sort of architecture with tool use and it is able to do more things than just a raw LLM. And this is what people typically think about agents. There’s the other type of agents, which is exemplified by generative agents, and there, the idea is simulation. There’s no goal. […] The goal is just to simulate and see what happens. Say you have a city of 25 agents, each backed by an LLM, prompted to basically live their daily lives. They interact, and what you see is different types of emergent behaviors, social emergent behaviors, not within a model. And I think that’s just really fascinating. One thing I think would be really interesting is what happens if you scale this up, which will require compute and fast inference. But if you could scale it up, maybe you actually have some interesting social dynamics.

On a fairer vision for training foundation models

Longer term, what I’m really excited about is a vision of how foundation models can be built. The current status quo is you have all these people in the world who write books write essays, take pictures, create, essentially, content which then gets scraped up into datasets that you use to train foundation models, then serve people and products. And this is has many structural problems. One is that the content producers don’t get actually any credit or pay. So that’s why you see many lawsuits that are happening. Another problem is that there’s a massive amount of centralization and determining these models’ behavior, which is, again, lack of transparency, so we don’t know what’s happening behind the scenes. And I just wonder how could we do things differently? I don’t have the technical answer, but just kind of a vision to paint out. So, what if we were able to actually attribute predictions to the actual training source?

This is actually something I worked on seven years ago, but in a more limited fashion. If you could do data attribution and you could do it reliably, then maybe you could actually set up a more economically viable system where you pay people for their contributions, and that maybe incentivizes better data quality. And there wouldn’t be the same lawsuits at least because maybe as long as people are getting paid, hopefully we’ll be happier. That’s one kind of direction.

The other direction is thinking about the values that these language models embody, which is something I think is really important to foreground and not just sweep it under the umbrella of ‘we’re aligning to human values and we’re being safe’ because that is such a complex construct, especially for a single organization to say, like, ‘Oh, don’t worry, we’ll handle it.’ It’s just not a viable way forward. So, how do you make this process more democratic? How can you elicit some values or how do you have a governance structure that is more participatory and gets you more better representation so that the values of a language model are actually reflecting what people want, rather than whatever a few set of people behind closed doors decided?

On the dangers of polarization

I do think that we live in this shared world, and if everyone has their own customized model, which really is a little virtual world that they live in, that’s basically how you get polarization. And I think that is a problem that we want to fight. If you think about each of these language models in the future, I think a primary way that we’ll interact with the world and get information and also take action in the world is probably going to be mediated by these models. So, that better be tethered to reality and not just based on some money-making ad scheme that gets people to basically believe whatever they want. And there needs to be some sort of shared reality, if nothing else because the real world demands it.

References

Stanford Question Answering Dataset (SQuAD)
Stanford Center for Research and Foundation Models
Foundation Models Transparency Index
MLCommons
Generative Agents: Interactive Simulacra of Human Behavior by Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation by Qian Huang, Jian Vora, Percy Liang, Jure Leskovec
Adam: A Method for Stochastic Optimization by Diederik P. Kingma, Jimmy Ba
Tengyu Ma
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness by Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
The Collective Intelligence Project
Collective Constitutional AI: Aligning a Language Model with Public Input

Thanks to Tessa Hall for editing the podcast.

About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating computing tools controlled by individuals.

Website: https://imbue.com/
LinkedIn: https://www.linkedin.com/company/imbue_ai/
Twitter/X: @imbue_ai
Bluesky: https://bsky.app/profile/imbue-ai.bsky.social
YouTube: https://www.youtube.com/@imbue_ai/

Seth Lazar, Australian National University: The political philosophy of AI

Imbue — Tue, 12 Mar 2024 23:14:19 GMT

Seth Lazar is a professor of philosophy at the Australian National University, where he leads the Machine Intelligence and Normative Theory (MINT) Lab. His unique perspective bridges moral and political philosophy with AI, introducing much-needed rigor to the question of what will make for a good and just AI future.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

On AI as a force multiplier for political power

“I got much more into thinking about the political philosophy of AI because I realized that AI based on machine learning was the most significant means of extending the capabilities of those who have power that has been invented since, I guess, the invention of law — so, a really significant force multiplier for those who govern. And as we were talking about before, we see that with the AI companion and summaries in Zooms: the ability to take all of the recording that we’re doing and translate that into actionable insights that you can then use to shape people’s behavior, it’s bananas.

So I end up focusing on that. But the cool thing is that with the kind of natural language capabilities of LLMs, there’s a sense in which you can kind of go back to some of those more top down-type approaches to ethics for AI that were kind of closed off when you had to find a way of mathematizing complex moral concepts. Now you can actually leverage natural language understanding and sort of an underlying moral understanding of those concepts.”

On overlooking moral nuance

“Because of that desire for certainty, I think a lot of folks have focused around a particular normative framing that offers that certainty at the expense of nuance. So there in particular, people have been worried about existential risk from future AI systems. And one of the reasons why people are worried about that or why they focus on that is because it removes all of these difficult questions about uncertainty because we all know it would be bad for the whole human race to be wiped out. There’s no debate — I mean, a little bit. Some people might debate it, but only on the margins and sort of obscure philosophy papers. But for almost everybody else, wiping out humanity sucks — and you don’t need to have these complicated questions about, like, what do we really want to do?”

On premature regulation

“I also think that the appetite for regulating foundation models due to motivations coming out of concern about existential risk, to my mind, has led to some bad decisions in the last year where there’s been a sort of an apparent alignment between folks who are concerned with the present and folks who are concerned with the further future. But I think that’s led to kind of just rushing through regulations for systems that we don’t really understand well enough to regulate successfully. So, for the most part, with the EU AI Act or with the Executive Order, it’s stuff that is intentionally designed to be fairly malleable — so regulations that will be susceptible to change over the next year or two. But I do on the whole thing that there’s been a bit of a mad dash to regulate for the sake of regulating which I think is probably going to have adverse near-term consequences, whether through becoming irrelevant or through limiting the decentralization of power.

On legitimate power

“The question of who gets to exercise power is really important. Like, is it appropriate that an unelected, unaccountable executive at a company far away from your country is making these significant decisions about how you’re able to communicate online, how you’re able to use your AI tools? Or should that be something that is a decision that is made by people within your country? If it’s people within your country, it’s not enough that it just be your compatriots, right? It needs to be the case that they are exercising power with the appropriate authority to do so.”

On the limits of human and generative agents

“That’s something that you wouldn’t want to happen with generative agents, that basically they get to kind of do things on your behalf that you wouldn’t be permitted to do for yourself. That would be a real risk. And if we just talk about alignment, then that’s what we’re going to get because they’ll just be aligned to the user’s interest, and damn everybody else. But I think also a lot of the constraints that apply to us are fundamentally conditional on the kinds of agents that we are. A lot of morality is about dealing with the fact that we’re not able to communicate instantaneously with one another in a way that is perfectly transparent. If we could do that, if we could coordinate in that way, where we could communicate, be perfectly transparent, and then stick to it, so much of morality would be so different.”

References

Waging War on Pascal’s Wager by Alan Hájek
What’s Wrong with Automated Influence by Claire Benn and Seth Lazar
On the Opportunities and Risks of Foundation Models by Stanford University’s Center for Research on Foundation Models
Frontier AI regulation: Managing emerging risks to public safety (OpenAI)
“The US is racing ahead in its bid to control artificial intelligence – why is the EU so far behind?” by Seth Lazar (The Guardian)
The Age of Surveillance Capitalism by Shoshana Zuboff
Constitutional AI: Harmlessness from AI Feedback (Anthropic) by Yuntao Bai et al.
Jamie Susskind
Democratic inputs to AI (OpenAI)
Digital Switzerlands by Kristen E. Eichensehr
Legitimacy, Authority, and Democratic Duties of Explanation by Seth Lazar
Power and AI: Nature and Justification by Seth Lazar
Communicative Justice and the Distribution of Attention by Seth Lazar
Toolformer: Language Models Can Teach Themselves to Use Tools by Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
Specific versus General Principles for Constitutional AI (Anthropic) by Sandipan Kundu, Yuntao Bai, Saurav Kadavath, et al.

Thanks to Tessa Hall for editing the podcast.

About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating computing tools controlled by individuals.

Website: https://imbue.com/
LinkedIn: https://www.linkedin.com/company/imbue_ai/
Twitter/X: @imbue_ai
Bluesky: https://bsky.app/profile/imbue-ai.bsky.social
YouTube: https://www.youtube.com/@imbue_ai/

Tri Dao, Stanford: FlashAttention and efficient training

Imbue — Wed, 09 Aug 2023 17:00:00 GMT

Tri Dao is a PhD student at Stanford, co-advised by Stefano Ermon and Chris Re. He’ll be joining Princeton as an assistant professor next year. He is the author of FlashAttention and Chief Scientist at Together AI. He works at the intersection of machine learning and systems, currently focused on efficient training and long-range context.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

On how to create a high-performing language model

“I think there are many paths to a high-performing language model. So right now there’s a proven strategy and people follow that. I think that doesn’t have to necessarily be the only path. I think my prior is that as long as your model architecture is reasonable and is hardware efficient, and you have lots of compute, and you have lots of data, the model would just do well.”

On designing algorithms that take advantage of hardware

“We’ve seen that sparsity now is proven to be more useful as people think about hardware-friendly sparsity. I would say the high-level point is we show that there are ways to make sparsity hardware-friendly and there are ways to maintain quality while using sparsity.”

On efficient inference

“I think there’s gonna be a shift towards focusing a lot on inference. How can we make inference as efficient as possible from either model design or software framework or even hardware? We’ve seen some of the hardware designs are more catered to inference now—think, for example, Google TPU has a version for inference, and has a different version for training where they have different numbers of flops and memory bandwidth and so on.”

On taking a contrarian bet on recurrent connections over attention

“We want to understand, from an academic perspective, when or why we need attention. Can we have other alternatives that scale better in terms of sequence length? Because the longer context length has been a big problem for attention for a long time. Yes, we worked on that. We spent tons of time on that. I looked around and maybe it’s a contrarian bet that I wanna work on something that maybe scaled better in terms of sequence length that, maybe in two to three years, would have a shot at not replacing transformer but augmenting transformer in some settings.”

References

Steven Boyd, Stanford
A Kernel Theory of Modern Data Augmentation by Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations by Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models by Tri Dao*, Beidi Chen*, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré.
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps by Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher Ré.
Monarch: Expressive Structured Matrices for Efficient and Accurate Training by Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré.
ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, et al.
LLaMA: Open and Efficient Foundation Language Models by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
Fast Transformer Decoding: One Write-Head is All You Need by Noam Shazeer
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness by Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré.
MLPerf
Young-Jun Ko from Inflection
Online normalizer calculation for softmax by Maxim Milakov (NVIDIA), Natalia Gimelshein (NVIDIA)
Dan Fu
Christopher Ré
Albert Gu
Phil Wang

Thanks to Tessa Hall for editing the podcast.

About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating computing tools controlled by individuals.

Website: https://imbue.com/
LinkedIn: https://www.linkedin.com/company/imbue_ai/
Twitter/X: @imbue_ai
Bluesky: https://bsky.app/profile/imbue-ai.bsky.social
YouTube: https://www.youtube.com/@imbue_ai/

Jamie Simon, UC Berkeley: Theoretical principles for deep neural networks

Imbue — Thu, 22 Jun 2023 18:52:23 GMT

Jamie Simon is a fourth-year physics Ph.D. student at UC Berkeley, advised by Mike DeWeese, and a Research Fellow with us at Imbue. He uses tools from theoretical physics to build a fundamental understanding of deep neural networks so they can be designed from first principles. In this episode, we discuss reverse engineering kernels, the conservation of learnability during training, infinite-width neural networks, and much more.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

“I do think that the deeper idea of reverse engineering kernels is powerful and probably holds across architectures. The central message isn’t really like: here’s the particular theory on fully-connected networks. The central message is: let’s think about the inductive bias of architectures in kernel space directly and see if we can do our design work in kernel space instead of in parameter space.”

“At first glance, the idea of an infinite-width neural network as a useful object of study sounds insane; and why should this be a reasonable limit to take? Like, why, if we want to understand a neural network which like obviously has to be finite to do anything useful, could we hope to learn anything by just making something infinite? Like that, especially is baffling from the viewpoint of classical statistics, where you, you hope to find a parsimonious model you wanna like wield Occam’s razor like a sword. So, it seems baffling at first that this should be useful, but it turns out actually a number of like, breakthrough results in the, especially, you know, around the early part of my PhD found that some really, like non-trivial, insightful behavior emerge when you take this infinite width limit.”

“In the case of infinite width: If the neural tangent kernel only has trivial alignment, like just chance alignment with the target function of the data it won’t generalize on it. But in practice, we see very good alignment between this kernel object and then the target function.”

“A question you could ask is, why do convolutional networks do better than fully connect networks on image data? Well, it turns out their kernels have better alignment with image data.”

“Although, people have shown interestingly that if you take the neural tangent kernel of a network after training then the real neural network after training looks a lot as if it had always had its final neural contingent kernel. So like you don’t have to worry so much about the evolution over time so much as where it ended up only.”

References

Redwood Center for Theoretical Neuroscience
Prof. Mike DeWeese
Reverse Engineering the Neural Tangent Kernel by Jamie Simon, Sajant Anand, and Mike DeWeese
The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks by Jamie Simon, Madeline Dickens, Dhruva Karkada, Mike DeWeese

Thanks to Tessa Hall for editing the podcast.

About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating computing tools controlled by individuals.

Website: https://imbue.com/
LinkedIn: https://www.linkedin.com/company/imbue_ai/
Twitter/X: @imbue_ai
Bluesky: https://bsky.app/profile/imbue-ai.bsky.social
YouTube: https://www.youtube.com/@imbue_ai/

Bill Thompson, UC Berkeley: How cultural evolution shapes knowledge acquisition

Imbue — Wed, 29 Mar 2023 18:25:24 GMT

Bill Thompson is a cognitive scientist and assistant professor at UC Berkeley. He runs an experimental cognition laboratory where he and his students conduct research on human language and cognition using large-scale behavioral experiments, computational modeling, and machine learning. In this episode, we explore the impact of cultural evolution on human knowledge acquisition, how pure biological evolution can lead to slow adaptation and overfitting, and much more.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

“In order to understand the computational processes that give rise to things like complex learned algorithmic behaviors like driving or playing chess or solving a Rubik’s cube or even language and speaking to each other, we need to have some way of reasoning about how knowledge accumulates across people.”

“This mechanism that we call selective social learning provides a solution to those two problems. The problem is that complex stuff is difficult to discover and difficult to pass on, so it increases the fraction of people who are exposed to the rarer discoveries.”

“One of the things we’ve been working on is trying to integrate those two things and develop a way of thinking about cultural evolution as distributed algorithmic processes or distributed computation. Thinking about population-level processes as distributed computational processes gives you a way of viewing groups and multi-generational societies — in a sense, simple societies — in the same terms that you can think about learning by individuals.”

“If I want to look at how large language models learn to reason, something I would love to do is start to knock out parts of the training data set and say, ‘okay, when you knock this part of training data set out, suddenly the reasoning capabilities go away,’ or 'suddenly this aspect of your knowledge or this capacity to acquire structured algorithmic thinking disappears.’ Even just simple stuff like that is not tractable at the moment.”

References

How Learning Can Guide Evolution by Geoffrey E. Hinton & Steven J. Nowlan
Computational Cognitive Science Laboratory
The pupillary light response as a physiological index of aphantasia, sensory and phenomenological imagery strength by Lachlan Kay, Rebecca Keog, Thomas Andrillon, Joel Pearson

Thanks to Tessa Hall for editing the podcast.

About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating computing tools controlled by individuals.

Website: https://imbue.com/
LinkedIn: https://www.linkedin.com/company/imbue_ai/
Twitter/X: @imbue_ai
Bluesky: https://bsky.app/profile/imbue-ai.bsky.social
YouTube: https://www.youtube.com/@imbue_ai/

Ben Eysenbach, CMU: Designing simpler, more principled RL algorithms

Imbue — Thu, 23 Mar 2023 00:27:16 GMT

Ben Eysenbach is a Ph.D. student at Carnegie Mellon University and a student researcher at Google Brain. He is co-advised by Sergey Levine and Ruslan Salakhutdinov. His research focuses on developing RL algorithms that get state-of-the-art performance while being simpler, scalable, and robust. Recent problems he's tackled include long-horizon reasoning, exploration, and representation learning. In this episode, we discuss designing more principled RL algorithms and much more.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

“If we see all the states we’ve seen so far and look at the representations, let’s imagine that those representations have a length of one, so we can think about them as points on a sphere. Then, after we put each of these points on the sphere, we can turn the sphere around and say, okay, where are most of the points, and where are we missing points? And say, you’re missing points down near Antarctica. And then we can say, okay, let’s try to get down to Antarctica. And then we could, because we’re learning a goal condition policy, we say, okay, try to get here or try to get to a state that has this representation.”

“One thing that I’m really excited about is thinking about how we can leverage this idea of connecting contrastive learning to reinforcement learning to make use of advances in contrastive learning in other domains like NLP and computer vision. In NLP, we’ve seen really great uses of contrastive learning for things like CLIP that can connect image ideas with language using contrastive learning. And in our contrastive project, we saw how we can connect the states and the actions to the future states. As you might imagine that maybe there’s a way of plugging these components together, and indeed, you can feel that mathematically there is. And so one thing I’m really excited in exploring is saying, well, ‘can we use this to specify tasks?’ Not in terms of images of what you would want to happen, but rather language descriptions.”

“One of the reasons why I’m particularly excited about these problems is that these language models, they’re trained to maximize the likelihood of the next token. That draws a really strong connection to this way of treating reinforcement learning problems as predicting probabilities and as maximizing probabilities. And so I think that these tools are actually much, much more similar than they might seem on the surface.”

“I don’t know how controversial it is, but I would like to see more effort on taking even existing methods and applying them to new tasks, to real problems. I think part of this will require a shift in how we evaluate papers—evaluating them not so much on algorithmic novelty rather than on ‘did you actually solve some interesting problem?‘”

References

Leave No Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning by Benjamin Eysenbach, Shixiang Gu, Julian Ibarz, Sergey Levine
Contrastive Learning As a Reinforcement Learning Algorithm by Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, Sergey Levine
Diversity Is All You Need: Learning Diverse Skills Without a Reward Function by Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine
The Information Geometry of Unsupervised Reinforcement Learning by Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine
Search on the Replay Buffer: Bridging Planning and Reinforcement Learning by Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine
RvS: What Is Essential For Offline RL via Supervised Learning? by Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine
Imitating Past Successes Can Be Very Suboptimal by Benjamin Eysenbach, Soumith Udatha, Sergey Levine, Ruslan Salakhutdinov

Thanks to Tessa Hall for editing the podcast.

About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating computing tools controlled by individuals.

Website: https://imbue.com/
LinkedIn: https://www.linkedin.com/company/imbue_ai/
Twitter/X: @imbue_ai
Bluesky: https://bsky.app/profile/imbue-ai.bsky.social
YouTube: https://www.youtube.com/@imbue_ai/

Jim Fan, NVIDIA: Foundation models for embodied agents, scaling data, and why prompt engineering will become irrelevant

Imbue — Thu, 09 Mar 2023 00:22:25 GMT

Jim Fan is a research scientist at NVIDIA and got his PhD at Stanford under Fei-Fei Li. Jim is interested in building generally capable autonomous agents, and he recently published MineDojo, a massively multiscale benchmarking suite built on Minecraft, which was an Outstanding Paper at NeurIPS. In this episode, we discuss the foundation models for embodied agents, scaling data, and why prompt engineering will become irrelevant.

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Highlights

“The second implication of RLHF is that prompt engineering will go away eventually. Like, it is something fleeting, and the prompt engineers… it’s just not a real job. Let’s face it. The reason prompt engineering will not be relevant forever is because RLHF – why prompt engineering even exists in the first place – is because these systems are misaligned with what humans want, so we have to kind of coerce the model to give us what we want by typing out very unnatural sentences, and to essentially trick the model into solving the task.”

“I’m still really amazed by how humans do this task. Because we’re doing the lowest level of control, right? Like, we do the keyboard and mouse controls. And if we want to be, like, stricter about the concepts, we’re sending neural signals to our fingers and then controlling the finger torques, the torques in each joint, to operate a keyboard and also using a mouse. It’s incredible how low level we are going, as humans, to do World of Bits, and we seem to have very little problem with our computational efficiency, but I guess procrastination is our unique problem. So that is our unique problem. But otherwise, we’re computationally efficient. We’re very efficient. So I’m just wondering, like, maybe there’s a way to actually make the lowest level, the most general action space, computationally attractive and even, like, more efficient than we thought it would be.”

“When I was starting to play Minecraft, I watched YouTube videos. I also went to Wiki to look up what to do in my first and, and Wiki tells you that, ‘Okay, these are the tools that you must craft and you need to, like, prepare food, otherwise you will starve, and what kind of foods are good, right?’ It’s all in, in the Wiki, and I also go to Reddit whenever I have a question. I treat that as a stack overflow, and Reddit people give a lot of good advice. That’s how I played Minecraft even as a humor. That gets me thinking, right, like why shouldn’t our AI use all of these internet skill knowledge? And if we want our AI algorithm to play this from scratch, it’s almost impossible because exploration is intractable. If you just take random actions, kind of how big is a chance that you stumble upon a diamond–it’s almost literally zero, right? So that also inspired the algorithm approach that we did.”

“What we want is to develop – or maybe discover, right – like, general principles to embody intelligence. That’s what we wanna do. That’s what MineDojo and Avalon want to achieve, want to enable, right? Not just kind of solving these particular 1000 tasks in the, kind of, the most brute force way. So, yeah, just a word of caution to researchers: resist the urge to overfit, to cheat, to use things that are super specific to Minecraft that will not transfer elsewhere.”

References

Thanks to Tessa Hall for editing the podcast.

About Imbue

Imbue is an independent research company developing a better way to build personal software. Our mission is to empower humans in the age of AI by creating computing tools controlled by individuals.

Website: https://imbue.com/
LinkedIn: https://www.linkedin.com/company/imbue_ai/
Twitter/X: @imbue_ai
Bluesky: https://bsky.app/profile/imbue-ai.bsky.social
YouTube: https://www.youtube.com/@imbue_ai/