Same Architecture, Different Verdicts: Eight Signals for Scale-Contingent Design
Written with Claude. Writing the Counterfactual Made the Answer Obvious The problem we were solving is simple to state: every piece of equipment in an engineering assessment report needs a defensible replacement cost. Where did that number come from? How confident are we in it? Could a client or a peer reviewer trace it back to a source? Answering those questions for hundreds of items across dozens of projects is the whole job. The system we built to do it had one sentence at its core: put the correct number into a file safely. ...
Spec-to-Kill, Forward
Written with Claude. The direction I wasn’t using Last week I wrote about the spec-to-kill method. Short version: I had an MCP server I’d built six months earlier, cargo-culted into eleven tools, genuinely useful but Rube Goldberg. I couldn’t decide whether to clean it up or rip it out. So Claude and I specced the best possible version (handler separation, schema validation, four clean tools), and then I asked whether even the ideal version earned its keep. The answer was no. grep handled the dataset. The spec became the teardown blueprint. ...
ELI7 Is Load-Bearing
Written with Claude. A Session I Mostly Didn’t Run I noticed something today while closing out a Claude session. The epic I finished implementing, #349 Parallel Agent Dispatch Quality, was three sub-issues I had not written. Prior sessions of mine had written them. I had no real memory of the specifics. The code-reviewer agent sometimes ships findings that are wrong. Three parallel reviewers produce prose the coordinator has to manually merge. Parallel implementation agents rename things in their own files and leave stale references in sibling files. Each of those is a filed issue with a proposed mechanical fix in the body. ...
The Bot With No Browser
Written with Claude. I was about to build the wrong filter I have a lot of GitHub issues on my field-service app. A hundred and sixty-one open, mostly written by me, about my own tool. Most of them I still want to fix. Some of them became stale before I got to them. ...
The Spec-to-Kill Method: Pressure-Testing Vibe-Coded Infrastructure
Written with Claude. The Swiss army knife you have to hold a certain way I run a field data collection platform for building assessments. The app has AI agents that look up equipment costs, write condition narratives, validate data, and generate reports. Those agents need to find cost records in a database, do deterministic math (no LLM arithmetic on dollar amounts), and write results back to a JSON file. ...
Inline Development on Rocket Skates
Written with Claude. The wall I keep hitting I’m building a field data collection app. SvelteKit PWA, offline-first, syncs equipment photos and condition data from a tablet back to a server where the analysis pipeline picks it up. Standard tooling for my day job as a mechanical engineer doing facility audits, property condition assessments, and energy assessments. ...
I Can't See If My Pants Are Down
The criticism that sticks There’s a version of the AI-assisted infrastructure story that goes like this: you’re walking around in public with your pants at your ankles, and you don’t even know it. You pasted configs from a chatbot, typed systemctl enable, and called it production. You don’t know what you shipped. You don’t know what’s exposed. ...
73 Findings, 15 Issues
Written with Claude. The entire process described here — the spec generation, the triage conversation, and this blog post — was done in one Claude Code session. The Wall of Findings I have a home network infrastructure repo. Raspberry Pi running Pi-hole, Caddy, WireGuard, CrowdSec. About 20 scripts, a Go TUI dashboard, some systemd units, a security dashboard. The kind of thing that accumulates config and code over 280+ sessions of Claude Code work. ...
The Three Misses
Written with Claude. The field visit, the data analysis, the issue filing, and this blog post were all done in one session. The Core Problem The worst thing an AI pipeline can do is put a plausible falsehood in a delivered report that nobody catches. Not a typo. Not a formatting issue. A statement that reads like a field observation, passes review because it sounds right, and lands in front of a client as fact. ...
Context as a Public Good
The Tax You Don’t Notice My workspace CLAUDE.md was 210 lines. It loaded into every session, across every project, every time. That’s roughly 3,500 tokens of instructions Claude reads before I say a word. I never questioned it. The file grew over months. A gotcha here, a workflow pattern there, a rule I wrote after getting burned once. Each line made sense when it was added. In aggregate, it was mostly noise. ...