// RESEARCH

I Built This Site With AI Agents. Here's Everything That Broke.

A build log of 403ai.org itself — the orchestration pattern, what shipped fast, and the failures that taught more than the wins.

Jun 7, 2026 · 403AI · 5 min read

  • The Forge
  • build-log
  • ai-agents
  • claude-code
  • case-study

// DISCLOSUREDrafted with AI and reviewed + edited by a human maintainer before publishing (Constitution §C).

This site was built almost entirely by AI. Not "AI-assisted" in the marketing sense — actual agents writing the actual code, with a human holding the seams. I want to walk you through how, and specifically where it fell on its face, because the failures taught me more than the wins did.

The setup

The method is an orchestration pattern, not a single magic prompt. One AI sits in a chat window as the architect: it makes the decisions, writes the specs, and reviews the work. A coding agent — Claude Code — lives in the terminal and does the actual building. Cheaper agents, Codex and Gemini, get delegated the boilerplate to keep the expensive tokens for the work that needs them. And a human — me — is the gate: I approve the specs, review every pull request, and do every merge. Specs first, then code. Nothing ships because an agent felt like it.

For the kind of work that has a clear definition of done, this is genuinely fast. Scaffolding, the content pipeline, API endpoints, the test suite, CI, the deploy to production — the agent flies through anything where a failing test or a compiler error can tell it when it's finished.

Then there's the other kind of work.

The logo that ate two hours

I asked the agent to build a custom logo — a constructed wordmark with letterforms drawn on a few shared angles, precise geometry. It ran for thirty-six minutes and slammed into its output limit with nothing to show. I cut the scope down and tried again. Same thing: another thirty-six minutes, another wall, more tokens gone.

The lesson landed on the second failure. Agents converge when something external tells them they're done — a failing test, a compile error, a type mismatch. Visual design has none of that. "Does this look right" is a judgment the agent can't make about its own output, so it loops: nudge a coordinate, re-render, second-guess, nudge again, forever, until it hits a hard limit. It wasn't bad at drawing SVG. It was trapped in a problem with no exit condition.

The fix wasn't a cleverer prompt. It was pulling the design out of the loop entirely — decide the geometry by hand, then hand the agent only the part with a clear done-state: drop in the finished file and wire it up. The logo is still parked, honestly. It's filed as an issue. This post shipped first, which probably tells you something true about priorities.

One token, two jobs

A color refactor took a tempting shortcut. Instead of updating every component, route them all through a single color token and just re-point that one token at the new color. Zero component changes. Shipped in minutes. Looked great.

Two problems surfaced later, both the same root cause. First, when I went to adjust the palette, flipping that one token flipped the entire site to a single color — because no element actually had a role of its own, they were all just sharing. Second, fixing an unrelated text-contrast bug meant brightening a token that turned out to also be the brand color, and the brand quietly slid from a deep oxblood to a bright coral. Nobody chose that. A contrast fix chose it.

One token doing two semantic jobs is debt, and it compounds quietly. The fix was boring and correct: give every color a real role — brand, action, success, danger — its own token, from the start. Slower up front. A lot cheaper than finding out your brand changed color because of an error message.

The human is the gate for a reason

The least glamorous lesson is the one I'd keep if I had to drop the rest. The review step earned its place. The agent never self-merges here; I review and merge every change. That caught the brand-color drift the agent literally could not see. It held a data-collection endpoint offline until the legal copy was actually finished instead of "good enough." It stopped more than one clever shortcut from quietly becoming load-bearing. An agent optimizes for the tests pass. Someone still has to optimize for this is actually right.

The takeaway

Not "AI builds everything now." Not "AI is overhyped slop." Both of those are lazy, and both are wrong. The real shape of it is narrower and more useful: AI is extraordinary at work that has a definition of done, and useless-to-dangerous at work that needs taste and judgment with no test to check it against. The skill that matters isn't prompting. It's knowing which kind of work you're looking at, and refusing to hand the second kind to a loop that can't tell when it's wrong.

This whole site is built that way. AI did the volume; a human held the seams. That isn't a disclaimer we bolted on at the end. It's the method.

The page said 403. You're reading it anyway. Now go build something.