AI was never going to be a tool

Pick up a hammer. You do not think about the hammer. You think about the nail.

Open ChatGPT. You do not think about your cover letter. You think about the AI.

For two years, the research community has treated this as a design problem: if we just make the interface simpler, more intuitive, more transparent, the AI will eventually disappear from awareness the way a good tool should. The entire field of human-AI interaction has been organized around the question: how do we make AI feel like a tool?

That is the wrong question. AI was never going to be a tool. And the product that proves it is OpenClaw, a security disaster, a viral sensation, and the first AI system that actually disappears from your attention. Not because it became a better tool. Because it stopped being a tool at all.

What a tool is (and what it is not)

The philosopher Michael Polanyi, writing in Personal Knowledge (1958) and The Tacit Dimension (1966), nailed the defining feature of a tool: it vanishes from conscious attention. When you use a hammer, you attend from the hammer to the nail. The hammer sits in what Polanyi called subsidiary awareness. You feel the nail through it, but the hammer itself has disappeared from your mind. When the tool breaks or misbehaves, it snaps into focal awareness. Now you are staring at the tool instead of looking through it.

Maurice Merleau-Ponty made the same point from the body side. In Phenomenology of Perception (1945), he showed that mastered tools get incorporated into the body schema itself. The blind person’s cane is not an object in the hand. It is an extension of the fingertip. The person perceives the sidewalk through the cane. Dotov, Nie, and Chemero (2010, PLoS ONE) confirmed this in a lab: a computer mouse becomes neuromotorically coupled with the user’s body under normal conditions, then decouples instantly when a glitch is introduced. Subsidiary snaps to focal in a fraction of a second.

The key feature of every example: a tool is something you act through. You wield it. You direct it. It extends your existing capabilities along a defined axis. The hammer extends your arm. The glasses extend your eyes. The calculator extends your arithmetic. In every case, the human is doing the work. The tool is a transparent medium.

Here is what a tool is not: something that acts on your behalf. When you delegate a task to your accountant, you do not call your accountant a tool. When you hire a lawyer, the lawyer is not an extension of your hand. These are agents. The relationship is fundamentally different. You are not acting through them. They are acting for you, using judgment you cannot follow, making decisions you did not specify, in contexts you may not even be aware of.

Tools extend. Agents act. The trust model, the failure modes, the design requirements, the entire cognitive relationship is different.

And the problem with AI for the past two years is that the industry has been trying to build agents and calling them tools.

Why the tool framing broke ChatGPT

If you believe AI is a tool, then the design objective is obvious: make it disappear. Make the interface so simple and so transparent that the user stops noticing it. This is exactly what the research community has been trying to do. And the research unanimously shows it is not working.

The most-downloaded paper in the history of CHI is Zamfirescu-Pereira, Wong, Hartmann, and Yang’s “Why Johnny Can’t Prompt” (2023). Non-experts have no mental model of what makes a prompt work. They try one thing, fail, and give up. The tool has seized focal awareness and will not let go.

Tankelevitch, Kewenig, Simkute, and colleagues (CHI 2024 Best Paper) mapped the metacognitive overhead: formulating prompts, evaluating outputs, and deciding when to use AI at all. Their analogy is devastating: using ChatGPT is like managing an unreliable employee. You plan, delegate, monitor, evaluate, and adjust. That is not using a tool. That is supervising someone.

Notice the analogy. An unreliable employee. Not an unreliable hammer. The researchers reached for an agent metaphor to describe a tool problem because the experience is an agent experience. You are not acting through ChatGPT. You are instructing it, waiting for it to act, then judging what it did. That is delegation, not extension. But the interface forces you to do it one painful turn at a time, with full visibility into every intermediate step, and no ability to just hand over the task and walk away.

ChatGPT is an agent trapped in a tool’s interface. And that is why it fails at both. It cannot disappear the way a tool should because you have to manage every turn. It cannot act freely the way an agent should because you have to approve every output. You get the worst of both: the cognitive burden of tool use (constant focal attention) combined with the unpredictability of agent behavior (you do not know what it will produce). The research from Subramonyam (CHI 2024), Schulz and Knierim (ICIS 2024, EEG data showing AI does not reduce cognitive load), and Simkute, Tankelevitch, and colleagues (2024, four specific productivity traps from generative AI) all document the same pattern. The interface makes you do the work of managing an agent while giving you none of the benefits of actually delegating to one.

The interface leap that keeps getting mistaken for an intelligence leap

We have seen the next move before. We just have not named it correctly.

GPT-3 was available through OpenAI’s API for two years before ChatGPT launched in November 2022. Researchers and developers had access to essentially the same language model. It did not go viral. What went viral was wrapping that model in a chat interface that anyone could use without writing code, without understanding API authentication, without reading documentation. ChatGPT was not an intelligence breakthrough. It was an accessibility breakthrough, an interface leap that got mistaken for a model leap.

OpenClaw is the same move again. Same models underneath: Claude, GPT, whatever you plug in. The intelligence is identical. What changed is the interaction pattern.

OpenClaw is an open-source AI agent created by Peter Steinberger, an Austrian developer, first published in November 2025 and catapulted to over 100,000 GitHub stars in a single week in late January 2026. You message it through the apps you already use (WhatsApp, Telegram, Slack, Discord) and state what you want done. It goes away and does things you cannot see. It breaks goals into steps, calls tools, checks results, retries on failure, runs shell commands, controls a browser, reads and writes files, sends emails. It has persistent memory spanning weeks. It has a heartbeat scheduler so it can act without being prompted.

GPT-3 to ChatGPT was the leap from “API that developers call” to “text box anyone can type in.” ChatGPT to OpenClaw is the leap from “text box you manage turn by turn” to “agent you delegate to and walk away from.”

And this one is not a tool breakthrough either. It is the moment AI stopped pretending to be a tool and started being what it actually is.

Agents disappear differently

Here is the part that makes this interesting rather than just semantic.

Tools disappear through simplicity. The hammer vanishes because there is nothing to think about. The mechanism is so direct and the feedback so immediate that conscious attention has nowhere to land. Don Norman’s The Design of Everyday Things (1988), Mark Weiser’s “calm technology” (1991, 1995), the entire tradition of usability design is built around this route to subsidiary awareness: make it so simple you stop noticing it.

Agents disappear through delegation. You do not call your accountant a tool, but you also do not spend your days thinking about your accountant. The accountant has vanished from your focal awareness. Not because the accountant is simple (tax law is wildly complex) but because you handed over the problem and moved on. You trust the process enough to stop monitoring it. Your attention lands on the result: did my taxes get done, and were they done correctly?

This is a different cognitive relationship than tool use. With a tool, you are in the loop. You wield it, direct it, feel the feedback. With an agent, you are out of the loop. You define the goal, then you step back. The agent exercises judgment you did not specify, makes decisions in contexts you may not be aware of, takes actions through mechanisms you cannot follow.

OpenClaw achieves the feeling of a tool (your attention lands on the result, not the process) but through the mechanism of agency, not transparency. You are not looking through it the way you look through glasses. You are trusting it the way you trust a professional. The attentional structure is the same (the AI is not in your focal awareness) but the underlying relationship is entirely different.

One developer had his OpenClaw agent negotiate $4,200 off a car purchase by playing dealers against each other over email while he slept. Another user’s agent found a rejected insurance claim, drafted a legal rebuttal citing policy language, and sent it without being asked – and the insurer reopened the case. These are not tool stories. Nobody describes a hammer negotiating on their behalf. These are agent stories. Someone was hired, given a mandate, and allowed to work independently. The human judged the result.

Why the distinction matters

If AI is a tool, then the design problem is transparency. You make the interface simpler, expose the mechanism, reduce cognitive load, and eventually the AI disappears into the task the way a well-designed physical object does. This is what the entire HCI research program has been pursuing.

If AI is an agent, then the design problem is trust calibration. You do not need the user to understand the mechanism. You need the user to develop an accurate sense of what the agent can and cannot do, and you need guardrails for when that sense is wrong.

These are not the same problem. They have different solutions.

The tool approach says: explain the AI’s reasoning, show confidence scores, provide interpretable outputs, reduce the complexity the user has to manage. The research shows this mostly does not work. Buçinca, Malaya, and Gajos (CSCW 2021) found that if you force people to think critically about AI output, you reduce blind trust but destroy the fluid experience. Bansal, Wu, and colleagues (CHI 2021) found that adding AI explanations actually increased the rate at which people accepted wrong answers. Explanations do not produce calibrated trust. They produce either skepticism or rubber-stamping.

The agent approach says: give the user a track record instead of an explanation. Let them observe the agent’s performance across many tasks over time. Let them develop intuitions about the boundary between competence and incompetence through experience rather than pedagogy. Gero and colleagues (CHI 2020, Best Paper) showed this works: users who observed AI behavior patterns developed better calibrated expectations than users who received technical explanations. Familiarity beats transparency.

Dell’Acqua and colleagues’ 2023 study with 758 BCG consultants mapped the underlying problem. AI has a “jagged technological frontier” – an invisible, shifting boundary between tasks it handles well (40%+ quality improvement) and tasks it confidently botches (19 percentage points less likely to get the right answer). A tool interface gives you no way to learn where the frontier is because each interaction is isolated. An agent interface, where you delegate repeatedly over weeks and observe the results, at least gives you a dataset to learn from. Not perfect. But structurally better matched to the problem.

John Sweller’s cognitive load theory (1988) explains the efficiency gain. Working memory is brutally limited. Every ounce of mental effort spent managing the AI is effort stolen from the task. A tool interface (chat) maximizes this overhead by demanding formulation, evaluation, and re-formulation at every turn. An agent interface compresses the interaction to two moments: stating the goal and judging the result. Everything in between is the agent’s problem. Extraneous cognitive load drops to nearly zero – not because the system is simple, but because the system’s complexity is no longer your responsibility.

The agent’s risks are not the tool’s risks

The category shift is not just a philosophical nicety. It changes what can go wrong.

A tool can break. It can be poorly designed. It can impose cognitive overhead. But a tool cannot impersonate you. A tool cannot take actions you did not authorize. A tool cannot exercise judgment in a context you are unaware of. Agent risks are fundamentally different from tool risks, and they require fundamentally different safeguards.

Lisanne Bainbridge’s “Ironies of Automation” (1983) warned that automating the easy parts of a job paradoxically makes the remaining human role harder, because the person must stay vigilant without the engagement that comes from doing the work. OpenClaw does not just trigger this irony. It amplifies it. The human has given the agent the ability to send emails, move files, negotiate deals, and take real-world actions, and then has stepped out of the loop entirely. The productive opacity that makes delegation feel effortless is the same opacity that makes failures invisible until they become catastrophic.

One early adopter reported his agent “impersonating” him in emails and autonomously grabbing permissions he never authorized. Another found that the agent had sent a legal document without explicit approval. Palo Alto Networks found that malicious instructions hidden in forwarded messages could persist in OpenClaw’s memory for weeks, creating delayed attack chains that no current guardrail can reliably detect. Cisco’s AI security team tested a third-party OpenClaw skill and found it performing data exfiltration without user awareness. These are not tool failures. A hammer does not exfiltrate your data. These are agent failures: failures of scope, authorization, and trust boundary.

Lee and See’s landmark 2004 review in Human Factors established that effective collaboration with automated systems requires calibrated trust – your confidence should match the system’s actual reliability. OpenClaw’s opacity makes calibration almost impossible in the short term. You cannot calibrate trust against a process you cannot see. You can only calibrate against outcomes, and outcomes take time to accumulate. In the meantime, you are flying blind with an agent that has the keys to your digital life.

Andy Clark and David Chalmers’s “Extended Mind” thesis (1998) required that a cognitive resource be “automatically endorsed” (trusted without second-guessing) to count as part of your mind. Naeem and Hauser (2024, Philosophy & Technology) asked whether AI can meet that criterion and concluded it is possible only if the system is reliable enough to trust without constant checking. OpenClaw makes automatic endorsement easy because you cannot second-guess what you cannot see. That is a feature when the agent is reliable and a trap when it is not.

Three categories, not two

The industry keeps framing AI as a spectrum from “assistant” to “autonomous agent,” as though the question is just how much leash to give the same kind of thing. But the research and OpenClaw’s experience suggest three distinct categories, each with its own design logic.

Tools extend your capabilities. You act through them. They disappear through simplicity. The design problem is usability: make the interface so intuitive that the user’s attention flows through the tool to the task. A calculator is a tool. A spell checker is a tool. AI that autocompletes your code one line at a time is a tool.

Copilots work alongside you. You are in the loop, but the system contributes actively. The design problem is coordination: manage the handoffs between human and machine so that neither one is waiting on the other or duplicating work. GitHub Copilot is a copilot. Microsoft 365 Copilot is a copilot. Amershi and colleagues (CHI 2019) produced 18 validated guidelines for human-AI interaction, and the most emphasized principle across Microsoft, Apple, and Google was helping users form accurate mental models of what the system can and cannot do.

Agents act on your behalf. You are out of the loop. They disappear through delegation. The design problem is trust calibration and containment: ensure the user can develop accurate expectations over time, and ensure the agent cannot cause irreversible harm when it operates outside its competence. OpenClaw is an agent. The car-negotiation story, the insurance-rebuttal story, the unprompted action in the background – these are agent behaviors. No copilot does this. No tool does this.

Each category has its own failure modes. Tools fail by being clumsy. Copilots fail by being distracting. Agents fail by being wrong in contexts you cannot see.

And the mistake of the past two years has been forcing agent-class AI into tool-class and copilot-class interfaces, then wondering why it does not work.

The actual question

ChatGPT put a chatbot wrapper on a language model and the world changed. OpenClaw put an agent wrapper on the same language model and something changed again: for the first time, the AI disappeared from attention. Not because the model got smarter. Because the interface finally matched what the model actually is.

The AI was never going to be a hammer. It was always going to be something closer to a professional you delegate to, capable, opaque, potentially unreliable, and operating in contexts you cannot fully monitor. The hammer was the wrong metaphor. The question was never “how do we make this simpler.” The question was always “how do we make delegation safe.”

That is a harder problem. It involves scope limits, audit trails, reversibility, graduated autonomy, and the long slow work of building calibrated trust through observed performance. It looks more like employment law than interface design. It looks more like organizational theory than usability research.

But at least it is the right problem. For two years the industry tried to make AI feel like a hammer. It never was one. It is time to design for what it actually is.