Pattern Card #1: Attention

How my one-year-old taught me more about AI transformers than Google’s research paper ever could

Mar 25, 2026

My son just turned one. He doesn’t understand transformers or attention mechanisms or the information economy. But he understands attention better than most executives I’ve worked with.

I was sitting on the floor with him a few weeks ago, building a block tower. The kind of thing where you stack four blocks and he knocks them down and you both laugh and you do it again. Except I wasn’t really there. My phone was on the carpet next to me, face up, and every few seconds my eyes would drift to it. Unconsciously. A flicker and a notification triggers a reflex too subtle to really register. Still, in subtle ways, you feel the haptic buzz and the body obeys.

He noticed before I did.

Within ten minutes he’d abandoned the blocks and was reaching for the phone. He didn’t want to play a game or watch something. His father’s eyes kept going there, and he is, like all of us, a creature who learns what matters by watching what the people around him attend to.

I put the phone in the other room. We finished the tower. But I kept thinking about what had just happened, because it wasn’t a parenting moment. It was a pattern. One I started seeing everywhere once I had a name for it.

PATTERN CARD #1: ATTENTION

What you attend to, you amplify. What you amplify, others attenuate toward. This is true at the scale of a living room floor and at the scale of civilizations, and the mechanism is identical in both cases. Attention is a force. It reshapes the thing that receives it and the person who directs it, simultaneously, in a feedback loop that most of us never notice because we’re inside it.

William James saw this in 1890. “My experience is what I agree to attend to,” he wrote in The Principles of Psychology. The emphasis falls on agree. There’s a transaction in that verb. Agreement, consent, a kind of contract between the self and the world. James also called the ability to voluntarily return a wandering attention “the very root of judgment, character, and will.” The root. Everything else branches from there.

Simone Weil, writing from a different tradition entirely, arrived at the same place fifty years later: “Attention is the rarest and purest form of generosity.” She meant it literally. To attend to someone fully, without agenda, without splitting your gaze between them and the glowing rectangle in your pocket, is to give them something that cannot be manufactured, automated, or scaled. It costs you. You spend it. And you don’t get it back.

The phrase itself is worth sitting with. Paying attention. We borrowed the language of economics without noticing what we were confessing: attention is a budget. It depletes. It has opportunity costs. Where you spend it is, arguably, the most consequential allocation you make. More than money, more than time, because attention is the mechanism by which time is converted into meaning. An hour with full attention and an hour with fragmented attention are not the same hour. They produce different memories, different relationships, different versions of you.

I started tracking this in my own life, loosely, after the block tower incident. I just started noticing. Where does my gaze go first in the morning? What pulls it? How often do I choose where to look versus follow wherever the brightest signal leads? The answers were uncomfortable. Most of my attention wasn’t being directed. It was being captured. And the things capturing it were, almost without exception, the things that had been engineered to capture it.

In 2017, a team at Google Brain published a paper called “Attention Is All You Need.”

They were talking about machine learning. Specifically, they were introducing the transformer architecture, the technical foundation beneath every large language model, every AI chatbot, every system that can read your email and summarize it and draft a reply that sounds disconcertingly like you. The core innovation was a mechanism called self-attention: a way for each piece of information in a sequence to look at every other piece and decide what matters. What to weight heavily. What to compress. What to, effectively, ignore.

The name was a technical description. But it landed like prophecy.

Because here’s what the transformer does, stripped of the linear algebra: it takes a field of information, and it decides, through learned patterns of weighting, what to attend to. The things that receive high attention scores get amplified. They shape the output. The things that receive low scores get suppressed. They’re still there, technically, but they don’t influence what comes next.

This is how human cognition works too, formalized in mathematics a few thousand years after monks and contemplatives figured it out through practice.

Vipassana meditation is attention training. You sit. You notice sensation. You resist the urge to react. You train the mind to observe its own weighting patterns instead of being driven by them. Lectio divina, the Benedictine practice of slow, repeated reading of sacred text, is attention training. You read the same passage four times, each time attending differently, each pass surfacing something the previous pass compressed. The Zen instruction “When you eat, eat; when you walk, walk” is a specification for what an engineer might call single-task attention allocation with zero context-switching overhead.

The transformer paper formalized something ancient. It proved, computationally, what the contemplatives had been saying for millennia: attention is the mechanism by which raw information becomes structured understanding. It’s all you need. Attention, the act of choosing what to weight, is the thing that turns noise into signal.

I find it beautiful, actually. That the most important machine learning paper of the decade accidentally restated the oldest insight about being human. The monks already knew. Vaswani et al. just wrote it in Python.

And once you see it this way, you can’t unsee it. The attention mechanism is consciousness itself, or at least the engine of it. Everything downstream, perception, memory, identity, is just the residue of what got attended to and what got filtered out. Transformers don’t think. But they do something eerily close to what we do when we think: they decide, from a sea of possible signals, which ones matter right now. The rest falls away.

Thanks for reading pattern ✦ engine! This post is public so feel free to share it.

I think about this when I watch how organizations decay.

I’ve worked in healthcare technology for years, and I can tell you the exact moment a company starts to lose its soul: the moment the attention shifts.

Here is how it happens. A manager starts their tenure focused on customer outcomes. Are patients getting better care? Are the people on the front lines equipped and supported? The metrics they track reflect this: satisfaction scores, outcome data, the qualitative texture of what users are experiencing. The attention of the organization, directed by this leader, is pointed at something real.

Then the pressure builds. Board meetings. Quarterly numbers. A new VP who speaks exclusively in acronyms. Slowly, and it is always slowly, which is why nobody notices, the attention migrates. SLAs replace satisfaction. Ticket counts replace outcomes. The dashboard still has numbers on it, and the numbers still go up and to the right, but what’s being measured has changed. And because what you measure is what you attend to, and what you attend to is what you become, the entire organism begins to shift. The attention moved, and everything downstream moved with it.

I’ve watched this happen at three different companies. The pattern is identical every time. You can predict, almost to the quarter, where the values of an organization will go based on what its leaders are attending to. If you want to know what a company actually cares about, don’t read the mission statement. Look at what gets discussed in the first ten minutes of the Monday meeting. That’s the real mission statement. Everything else is decoration.

Attention is a teaching signal. Every time a leader opens a meeting by reviewing ticket closure rates instead of patient stories, they are teaching the team what matters. Every time the all-hands celebrates speed-to-resolution instead of quality-of-resolution, the collective attentional weight shifts a little. And the people in the room, like my son on the floor with the blocks, learn what’s important by watching what the people above them attend to.

Herbert Simon named this in 1971: “A wealth of information creates a poverty of attention.” He was talking about individuals, but it applies to organizations too. As the information surface expands, more dashboards, more Slack channels, more reports, more data piped in from more systems, the attention budget doesn’t expand with it. It stays fixed. And so the question that matters is: what are we choosing to attend to, and what are we teaching everyone else to attend to by that choice?

The leader’s real job is attention stewardship. You are the person who points the collective gaze. And everything you point it toward grows, and everything you point it away from withers. Whether or not you meant it to.

Which means the most important design problem in any organization is the attention architecture. What loops are you building? What feedback cycles are you reinforcing? If you track ticket velocity, you’ll get ticket velocity. If you track patient outcomes, you’ll get patient outcomes. Both require roughly the same amount of effort to measure. The difference is entirely in what you choose to attend to. And that choice, repeated daily across hundreds of meetings and dashboards and one-on-ones, compounds. It becomes the culture. The real one, I mean. The one that lives in what people actually do, which may have nothing to do with the one painted on the lobby wall.

This gets more personal, and stranger, when you apply it inward.

When I love something, I attend to it. That’s almost a tautology. But the second-order effect is less obvious: when I love something, I also stop attending to things that threaten or complicate that love. I over-weight information that harmonizes with the feeling and under-weight information that doesn’t. This is how devotion works. It’s also how addiction works. The mechanism is the same. The only difference is in what’s being attended to, and whether the resulting feedback loop expands your life or contracts it.

Think about that for a second.

A person in love sees everything as confirmation. A person consumed by resentment sees everything as provocation. Neither is perceiving reality accurately. Both are running an attention loop that amplifies its own inputs. The transformer architecture has a term for this: cross-attention, the mechanism by which one sequence shapes the processing of another. My attention to my phone shaped my son’s attention to screens. My manager’s attention to metrics shaped the team’s attention to what counted as success. My attention to the things I love shapes my perception of the world into a place that seems to confirm the importance of those things.

This runs everywhere, once you start looking. Militaries have known it for centuries. When you take a country, you seize the media first. The banks can wait. The government buildings can wait. The media can’t, because if you control what a population attends to, you control the population. The first strikes in Operation Iraqi Freedom targeted state television infrastructure. Russian information doctrine explicitly targets attention and narrative. The technical military term is PSYOP, but the honest term is attention capture at population scale.

We’re all running cross-attention, all the time. The only question is whether we know it.

This brings me to the thing I keep circling. The thing I think about when people ask me what it’s like to work with AI every day.

When you delegate your research to an AI, when you type a question and get a synthesized answer in four seconds instead of spending forty minutes reading primary sources, something happens to your attention. The obvious thing: you save time. The less obvious thing: you lose the experience of not-knowing. The sitting-with-it. The slow accumulation of context that happens when you read three conflicting papers and have to hold the contradiction in your head long enough to form your own view.

I use these tools constantly. They are, in many cases, extraordinary. And I want to be honest about the trade. Every attention delegation reshapes the delegator. When you let an algorithm curate your feed, you outsource the serendipity of encountering something you never would have searched for. When you let a model draft your emails, you lose the micro-practice of translating your own thoughts into language. And that practice, tedious as it sometimes is, is one of the ways you discover what you actually think.

Every tool you use to manage attention changes the shape of your attention. A hammer doesn’t just let you drive nails. It makes you start seeing the world in terms of things that need to be hammered. A search engine doesn’t just find information. It trains you to think in queries. A language model doesn’t just generate text. It trains you to think in prompts. And a prompt is a very specific kind of thought. It’s a thought shaped for delegation. Over time, if you’re not careful, that becomes the default shape of all your thoughts.

Right now, in this particular moment, a lot of people don’t have a choice about any of this. AI is restructuring how work gets done, and many people are being told, explicitly or through the quiet pressure of watching their colleagues get replaced, to shift their attention and energy toward these tools or face consequences. “Learn to prompt or find another job.” I’ve heard that sentence, or sentences like it, more times than I can count in the last two years.

For some people, this is a gift. A genuine expansion of what’s possible. A junior developer who can now build things that would have taken a team. A researcher who can synthesize a literature review in an afternoon. For others, it’s a tragedy. The thing they spent a decade getting good at, devalued overnight by a system that can approximate it at scale. For most people, honestly, it’s both. Simultaneously. In ways that are hard to articulate because we don’t have good language yet for “I’m grateful for this tool and also mourning the version of my work that didn’t need it.”

But the real story is about what these tools do to your attention. What happens to your formation, the slow daily process of becoming who you are, when the things you attend to shift because the ground shifted beneath you. I keep coming back to that question: what is AI doing to my attention, and what is my attention doing to me?

You can’t always control your situation. I know that. I know people who’ve been told to retool their entire professional identity around technology they didn’t ask for, and they didn’t get a vote. The conditions of attention aren’t always chosen. Sometimes they’re imposed by economies, by employers, by the simple brute fact that the world changed and you’re in it.

But there’s a difference between what you attend to and what you make of what you attend to.

Viktor Frankl wrote about this from inside a concentration camp, so I’ll spare you the motivational poster version. The point is narrower than “choose your attitude.” It’s that attention has a directional component. Two people in the same meeting, exposed to the same information, can walk out having attended to entirely different things based on what they were looking for, what they were afraid of, what they were curious about. The raw inputs were identical. The attention patterns, and therefore the outputs, the formation, the becoming, were not.

The contemplatives knew this. That’s why they built practices. They trained the mechanism by which they engaged the world. Meditation is the act of noticing what the mind attends to when left to its own devices, and then, gently, without violence, choosing to redirect. Over and over. James’s “voluntarily bringing back a wandering attention” is the practice of selfhood. The oldest technology we have for being deliberate about who we become.

Discussion about this post

Ready for more?