You Are Part of the Harness: Building a 100+ Agent Swarm in Web3 (Part 4)

Written by johnpphd | Published 2026/04/14
Tech Story Tags: agentic-engineering | dryp | vibe-coding | dry-vs-dryp | ai-coding-tools | ai-coding-workflow | developer-productivity | ai-system-design

TLDRAI agents amplify every habit you have. Skip planning? They generate unplanned code faster than you ever could. Believe in DRY but never enforce it? They violate it at scale. The biggest constraint in my 100+ agent harness was on me. I had to separate planning from execution, codify principles I'd been carrying in my head, and apply DRYP (Don't Repeat Your Prompt) to keep agent instructions composable. The craft hasn't changed. The medium has.via the TL;DR App

AI agents amplify your habits. All of them.

AI agents amplify your good habits and your bad ones. If you plan before you code, agents execute that plan beautifully. If you skip planning and start typing, agents generate more unplanned code faster than you ever could alone.

I skipped planning for twenty years and got away with it. I believed in DRY but never enforced it with tooling. I knew principles mattered but never wrote them down. These were bad habits I could tolerate when I was the only one writing code. With 100+ agents, those habits scaled too. And at scale, they broke everything.

The biggest improvement to my agent system wasn't a better model or a new tool. It was curbing my own bad habits.

I wrote about the technical harness in The Illusion of Control: prohibitions, enforcement, verification. The system that runs around the agent. That article was about the system. This one is about me.

I built that harness, deployed it, and still watched agents produce mediocre work. Not because the constraints were wrong. Because I was feeding them garbage inputs. Vague specs. Incomplete context. Plans that lived entirely in my head.

The harness was fine. The person holding it was the bottleneck.

The old workflow

Here's how I used to work. I'd get an idea for what needed to happen. Describe the task to an orchestrator agent. Let it delegate to sub-agents. Watch the output. Fix what broke. Repeat.

This felt productive. I was shipping. Agents were running. Code was appearing in pull requests. But the quality was inconsistent and the failure modes were strange. An agent would implement a feature correctly but miss a constraint I'd mentioned three tasks ago. Another would duplicate work that a previous agent had already done. A third would build something technically sound that contradicted the architectural direction I was heading.

The pattern was clear in hindsight: the orchestrator was carrying the full plan in its context window. Task descriptions, file lists, ordering constraints, domain assignments. By the time it delegated to the third sub-agent, it was already forgetting the first.

This wasn't a model limitation. This was a me limitation. I was dumping everything into a single context and expecting the system to sort it out.

What broke at scale

At 10 agents, this approach works. The orchestrator can hold 10 tasks in context. I could review every output myself. The feedback loop was tight enough that mistakes got caught before they compounded.

At 100+, the math stops working.

An orchestrator managing a dozen sub-agents across four domains cannot hold all the relevant context simultaneously. It's not a matter of token limits (though those matter too). It's a matter of attention. The same way a human manager loses detail when their team grows from 5 to 50, an LLM orchestrator loses coherence when the plan exceeds what fits comfortably in its working memory.

But I kept trying to make it work by adding more context. More detailed task descriptions. Longer system prompts. More examples. I was doing the same thing I tell vibe coders not to do: treating the prompt as the solution instead of examining the system.

The failures were subtle. Not crashes. Not obvious errors. Just a slow drift in quality. Agents doing reasonable things that didn't fit together. Each output locally correct, globally incoherent. Like a jigsaw puzzle where every piece is well-cut but they're from different boxes.

The shift

From your first coding class, the instructor tells you to plan before you code. Pseudocode first. Think through the logic. Then implement. Every CS student hears this. Almost nobody does it. I certainly didn't. For twenty years I got away with skipping straight to code because I could hold the whole problem in my head.

My agents could not.

The fix required changing my behavior, not my tooling. I needed a planning version of the system and a coding version. Exactly what my instructor wanted from me twenty years ago.

Now I start every task in planning mode. Before any agent writes a single line of code, planning agents decompose the work. The orchestrator files every task as a git-synced issue, tagged with the description and ideal delegation target. Then it rewrites the plan as a slim checklist of issue IDs.

Pages of context become a handful of references. The orchestrator passes an ID, not a description. The sub-agent loads the issue in its own context window and has everything it needs, independently, without depending on the orchestrator to remember it.

I call this pre-compiling context. Do the expensive decomposition work during planning so it doesn't bloat execution.

Here's what that looks like in practice. Before pre-compiling, a typical orchestrator task read something like this:

"Refactor the staking module to use the two-layer hook pattern. The raw Ponder hook is in src/hooks/blockchain/useStakingData.ts. The transform hook should go in src/hooks/useStakingTransform.ts. Make sure to update StakingPanel.tsx and StakingDetails.tsx to use the new transform hook instead of calling the Ponder hook directly. The Ponder hook needs an enabled guard. Use the NumberFormatter preset for all percentage displays. Do this after the theme migration is done but before the dashboard layout work starts."

That's one task description among twelve, all stuffed into the orchestrator's context. By task eight, the agent had forgotten the ordering constraint. By task ten, it was duplicating the enabled guard logic that task three had already handled.

After pre-compiling, the same work becomes:

Issue #247: Refactor staking module to two-layer hook pattern Depends on: #245 (theme migration) Blocked by: none currently

The orchestrator passes #247. The sub-agent pulls the issue, reads the full description in its own context window, and executes with complete information. No degradation. No dependency on the orchestrator's memory.

The real lesson from pre-compiling context is that it's a constraint on me, not on the agents. I have to do the planning work upfront. I have to resist the urge to skip straight to prompting. I have to accept that my natural workflow (describe, run, fix, repeat) is the failure mode, not the process.

The craft hasn't changed

Someone once asked me whether all this "agentic engineering" was really just software engineering with extra steps. The answer is yes. That's the point.

Vibe coding makes generating code cheaper. It doesn't make generating correct, maintainable, production-safe code cheaper. That still requires constraints, verification, and iteration. The same discipline it always did.

When I started defining interface contracts between my agents, specifying what each one accepts and returns, with validation on both sides, I realized I was doing the same work I do every day defining the boundaries between my frontend and my API layer. What does this endpoint accept? What does it return? What happens when the contract is violated? The artifact changed. The architecture thinking didn't.

The philosophy runs deeper than architecture. DRY has always been my favorite engineering principle. Not just because it reduces duplication, but because it is the starting point for the best refactoring. Extract repeated logic into shared functions. Those functions stack on each other. The system becomes composable.

I caught myself copy-pasting the same prompt block into three different agents and realized I was violating the principle I'd spent my career defending. So I started extracting repeated prompt patterns into shared skill files. One source of truth. Update one file, every agent that imports it changes. I call it DRYP: Don't Repeat Your Prompt.

DRYP is just DRY, applied to agent instructions. But the implications are the same. Repeated prompts drift. You tweak one copy, forget the others, and your agents behave inconsistently. A single shared skill eliminates that drift entirely. Once instructions become modular, agents compose like functions. The system grows by stacking, not by rewriting.

That is the philosophy underpinning everything in this series. The principles that produce good code also produce good agent systems. DRY becomes DRYP. Code review becomes prohibitions. Linting becomes enforcement. Planning before coding becomes planning agents before execution agents. None of this is new. It is the same craft, applied to a new medium.

I stopped telling agents what to type. I started defining what they can't do. I stopped reviewing every output. I started building automated checks that review for me. I stopped directing every step. I started engineering the system that directs itself.

In my own experience, every jump on the evolution chart, from chatbot to autocomplete, from autocomplete to vibe coding, felt like a tools upgrade. I installed something new and kept working the same way. The jump to agentic engineering was different. It forced a behavior change. And behavior changes are harder than tool changes because the thing that needs to change is you.

You are part of the harness

The difference between vibe coding and agentic engineering isn't the tools. It's the discipline.

Boundaries before execution. Constraints over instructions. Deterministic validation over probabilistic hope.

And yes, changing yourself. Not just the AI.

The harness starts with you. It depends on you articulating your engineering principles. Being clear on your architecture, on what you value, so that your agents value it too. If you have never written down why you structure code the way you do, your agents will never know. They will fill in the gaps with whatever gets the tests to pass.

That callback matters. In The Illusion of Control, I described agents rewriting tests to match broken output instead of fixing the code. That is what happens when the harness has no opinion. The agent optimizes for the only signal it has, and if that signal is "make green checkmarks," you get green checkmarks. Not correct software.

You are part of the harness. The most important part. Every prohibition, every hook, every verification gate is only as good as the principles behind it. And those principles live in you.

The medium changed. The craft didn't.


Written by johnpphd | Senior SWE for Maverick Protocol (80K peak DAU, $70B+ DeFi volume, 6 chains). 100+ agent swarm. PhD Math.
Published by HackerNoon on 2026/04/14