*everyone's building agent skills. Nobody's Building Survival.

The internet is teaching AI agents to design slide decks and run museum exhibits. Meanwhile, the same four-word error is quietly killing them in production. That gap is the most interesting thing in agentic AI right now.

This month, the most-starred new project in the agent world is a repo called claw-code — an “agent-managed museum exhibit, developed and maintained with no human intervention.” Approximately 194,000 stars, and climbing. Right behind it, a tool going genuinely viral — tens of thousands of stars in under a week — with a one-line pitch that amplifies the cultural moment we’re living: “Makes your AI agent think like the laziest senior dev in the room. The best code is the code you never wrote.”

We are, collectively, in love with what agents can do. Generate a brand-matched UI from a DESIGN.md file. Produce a polished slide deck. Edit video. “Co-evolve” with their own models. The skills marketplace is exploding, and every week there’s a new repo teaching agents a new trick.

So I went looking for the other side of the ledger — not what agents can do in a demo, but what they’re failing to do for the people actually running them, or handling clumsily. I spent some time reading the loudest agent-related issues filed on GitHub this past month: the bug reports, the 3-a.m. “this is broken in prod” threads, the ones with dozens of thumbs-up from strangers who’d hit the exact same wall.

And one issue kept appearing, across completely unrelated projects:

Error: 'NoneType' object is not iterable

It showed up in OpenAI’s Codex. It showed up in Nous Research’s Hermes agent — twice. It showed up, in spirit, in an open-source coding agent whose process “intermittently freezes in a working state, burning CPU on an idle socket.” It showed up next door in Anthropic’s Claude Code, where scheduled jobs “fail every single tool call” because a connector won’t clear approval. Different companies. Different stacks. Different languages, even. The same category of failure.

Here’s what they all have in common: none of them are about what the agent can potentially do. They’re about the seam — the unglamorous boundary where an agent communicates with the things it depends upon. For example, its model provider, OAuth token or streaming connection. Its tools. That seam is where agents go to die, and right now every team is re-implementing it from scratch and breaking it in precisely the same place.

why the boundary is where everything breaks

Demos hide this, by design. A demo runs once, on a good network, with a fresh token, on the happy path. It’s the shiny magic trick, that makes you think you’ll be a trillionaire tomorrow — and the trick works. Until you go into production.

Of course, production is the opposite of a demo. It’s the same agent running ten thousand times, on flaky connections, with tokens that expire mid-task, against providers that silently change their defaults (one of the issues I saw was when a developer discovered that his agent’s web search had been quietly re-routed to a different backend with no opt-in at all). It’s streams that terminate early and hand your parser a ‘None’ where it expected a list. It’s a tool that needs human approval at 2 a.m., when said human is either fast asleep or out partying (unlikely with this crowd, but hey, it could happen).

None of that is glamorous or demos well. You can’t post a screenshot of “my agent gracefully recovered from a dropped stream” and go viral. So the entire ecosystem’s attention — and its star counts — pour into capabilities, the fancy workflow, while the layer that decides whether that workflow performs out in the wild gets reinvented, oh so poorly, in a thousand (or more) private codebases.

this is what a field growing up looks like

Every technology has this moment. The web had it — the shift from “can I build a page” to “does it stay up under load.” Mobile had it. The question quietly changes from can it do the thing? to does it keep doing the thing when the going gets tough?

From what I can see on GitHub, agents are having that awkward moment right now. And almost nobody is naming it, because the loud part of the room is still excitedly teaching agents new tricks.

Which brings me back to that viral repo — the best code is the code you never wrote. I don’t think that pitch caught fire by accident. Underneath the joke is a real instinct that the best agent engineers are starting to share: restraint.  (By the way, this is a strong corollary of our old friend, “taste.”)  Knowing where not to let the agent loose. The teams building the most reliable agentic systems right now are conspicuously stingy about where they hand over the wheel — they automate the boring, deterministic parts and keep a tight leash on the parts that actually require judgment.

That’s not timidity. That’s the discipline that separates a product from a party trick. Bounded autonomy — an agent that knows its limits, recovers from failure, and stops when it should — is harder to build than another shiny new skill, and it’s worth more.

the boring frontier is the real one

Here’s my bet. The next winners in agentic AI won’t be whoever ships the most skills. Skills are becoming a commodity — there’s already a free one for nearly everything, and the big, well-capitalized platforms will almost certainly absorb the rest. The winners will be whoever makes agents that can survive: the unglamorous reliability layer between the model and the messy unpredictable world.

The first signs are already here. While the slide-deck generators were grabbing headlines, a quietly fast-rising project in my scan was a “meta-harness” — a layer whose entire job is to orchestrate agents, enforce policies, and sandbox them. The market is beginning to notice that the interesting problem isn’t more capability. It’s survivable capability.

and then the platform showed up

And then.  As is the habit of all AI stories, the landscape changes before we can fully digest the trend.  Between drafting this and publishing it, the platform I keep warning about walked on stage and said almost exactly what you’ve just read.

On June 16, at its Data + AI Summit, Databricks expanded Agent Bricks into a full “agent platform for developers.” Their framing could have been lifted from the section above: they argue the core agent loop is just 1% of the work, and that the other 99% — token capacity, deployment, security, evaluation, monitoring, context — is the “hidden technical debt of agentic systems.” That is the boring frontier, named on a keynote slide by a company processing more than a quadrillion agent tokens a year. When the incumbent starts evangelizing your thesis, it’s worth looking carefully at what they actually shipped — and, more importantly, what they didn’t.

What the platform now covers. A real chunk of the seam, for one specific kind of customer:

       The provider boundary. Their new Unity AI Gateway governs models, MCPs, and external agents through one layer, with per-user and per-group budget enforcement and “intelligent routing of traffic based on reliability, budget policies, or other controls.” That’s cross-provider failover and spend ceilings — two of the exact things teams were hand-rolling — turned into a platform feature.

       The tool boundary. By adding MCP support to their catalog, agents can connect to Google Drive, Jira, Slack, and GitHub through a governed layer, with a Databricks Sandbox spinning up isolated VMs to contain tool execution. The approval-and-isolation problem, handled centrally.

       The context boundary. This is the genuinely novel piece, so let’s dig in a little deeper. Genie Ontology continuously learns a semantic map of a company’s data — when the fiscal year starts, what “churned customer” means, which table is authoritative — so agents don’t rebuild that context on every single call. Paired with a managed agent memory service and stateful, context-aware security policies you can write in SQL, it’s a real answer to “the agent did something dumb because it didn’t understand the business.”

If you are a Fortune 500 with your data already in the lakehouse, this is a serious, coherent answer to the survival problem. Much of the seam I described is now somebody else’s managed service.

but what does the Databricks solution leave untouched?

Every capability above assumes you’re already inside the platform. Genie Ontology learns an ontology over data sitting in their catalog. The gateway governs assets registered in their registry. The sandbox scopes access to their governed data. It’s a magnificent control plane for the enterprise that has already paid the platform tax — and structurally irrelevant to the startup running a Claude-Code-style agent on a VPS, or the solopreneur whose agent froze on an idle socket at 3 a.m. None of those people have a Unity Catalog. Their NoneType is not iterable is exactly as unsolved this week as it was last week.

There’s also a detail in the announcement that turned my own research into a hall of mirrors, and it’s too good not to share. That fast-rising open-source “meta-harness” I mentioned— the one orchestrating and sandboxing agents? It’s called Omnigent. It turns out Databricks released it, and at the summit announced a managed version of it. So the very project I’d read as an independent signal that “the market is noticing survivability” was, in fact, the platform planting its flag. A useful reminder that a rising star on GitHub and an incumbent’s land-grab can be the same object — and that you can’t always tell from the demand signal alone which one you’re looking at.

so what does the startup, the solopreneur, the hobbyist actually do?

If you’re not an enterprise, the platform’s answer isn’t your answer. Here’s the honest lay of the land for everyone else.

For the provider half, there’s already a real toolbox — and it’s mostly free. A whole category of “AI gateways” does cross-provider failover, retries, streaming continuity, and budget caps without any platform lock-in. LiteLLM (open-source, 100+ providers, OpenAI-compatible, deploy in an afternoon) is the pragmatic default. Portkey adds caching and a clean fallback config; Bifrost goes for raw throughput and a documented health-state machine; Cloudflare and Vercel have edge gateways if you’re already on their stack. If your pain is “my one provider went down and took me with it,” that’s a solved, commodity problem. Reach for one of these before you write a line of retry logic.

For the tool / MCP half and unattended recovery, you are mostly on your own — and that’s the real gap. The gateways speak the provider protocol; they don’t much help with the MCP connector that fails every tool call because nothing cleared its 2 a.m. approval, or the agent that freezes in a “working” state on a dead socket. That recovery-and-stop logic lives above the gateway and below the orchestrator, and for non-enterprise builders it’s still bespoke. Practically, today, that means: set hard iteration and time budgets yourself; treat every tool call as potentially hanging and wrap it in timeouts; make approvals fail safe rather than fail silent; and log every boundary failure so you can see the pattern. Unglamorous, manual, and exactly where the missing product is.

The strategic read. The platforms are absorbing this from the top down (Databricks) and the gateways from the bottom up (LiteLLM and friends). What neither has fully claimed is the middle for the small builder: tool-layer reliability and bounded-autonomy recovery for people who will never own a Unity Catalog and shouldn’t have to. That’s a narrower gap than “everyone’s agents keep breaking” — but narrower is where a small team can actually plant something a platform won’t bother to chase down-market.

 

The skills will keep coming, and they’ll keep being fun. But if you want to know where the durable value in agents is hiding, don’t watch what they can do in the demo. Watch what breaks at 3 a.m. — and watch who’s quietly building the thing that catches it and repairs it. This month taught me a second lesson on top of the first: watch, too, when the biggest platform in the room starts describing your gap in its own words. That’s not proof you’re wrong. Sometimes it’s proof you were early — and a map of exactly which corner of the gap they’ve left for you. It also shows how quickly business opportunities (and indeed, moats!) are evaporating in our current AI-fueled competitive landscape.

 

The human mind is the original generative engine.

Keep Reading