Written with Claude. The field visit, the data analysis, the issue filing, and this blog post were all done in one session.

A golf course diagram showing three types of misses

The Core Problem

The worst thing an AI pipeline can do is put a plausible falsehood in a delivered report that nobody catches. Not a typo. Not a formatting issue. A statement that reads like a field observation, passes review because it sounds right, and lands in front of a client as fact.

AI doesn’t produce truth. It produces plausibility. Those overlap when the input is tight. They diverge when the input is loose. So everything in the pipeline has to be designed around one question: how do we make it impossible for the AI to say something we can’t verify?

How I Found This

I build assessment tools for commercial buildings. An engineer walks a property, documents equipment, takes photos, and that data flows through a pipeline that produces cost tables and report narratives. After a recent site visit, I traced every data point through the pipeline and found it drifting toward plausibility at three different points.

The condition notes field auto-filled “No significant issues observed.” on 29 of 36 items when the engineer selected “Good” condition. A reasonable time-saving default. But downstream, the condition skill read each one as a field observation and would have generated prose like “Equipment was observed to be in good condition.” The system was talking to itself.

The narrative engine generated text from half-filled fields and produced broken sentences. A cost agent wrote coordination notes referencing equipment that didn’t exist in the building.

None of these were bugs. Each was the system trying to be helpful. Filling in a reasonable default. Generating a preview. Making a professional-sounding footnote. But each one introduced content that didn’t trace back to something someone actually observed.

I started thinking about it like golf.

Three Ways to Miss

Every stage of the pipeline is a shot toward a factual report. There are three ways to miss the pin.

Short. Incomplete data with visible gaps. A narrative that says [manufacturer] where the value should be. This is fine. You know what’s missing. One more pass fills it in. Honest incompleteness is the normal state of a project in progress.

Long. The system polishes incomplete data into something that looks finished. Auto-filled notes. Generated narratives from sparse fields. I measured report readiness at 55%, but the file looked like 80%. Nobody realizes there’s a problem because the output reads like it’s done. The recovery requires figuring out which parts are real and which the system supplied. Harder than starting from a blank.

If you know machine learning, this is overfitting. The AI learned what buildings usually look like and filled in what this building probably has. Confident, specific, wrong. An underfit report has visible gaps. An overfit report has no gaps because the AI filled every one with a pattern from other buildings. Underfitting is short of the pin. Overfitting looks like a better answer. It’s a more dangerous answer because nobody questions it.

Wrong fairway. The AI generates plausible content that’s factually wrong. A coordination note about equipment that doesn’t exist. It’s 300 yards in the wrong direction, and the recovery isn’t 300 yards back. It’s the hypotenuse. Finding the fabrication in a delivered report, retracting it, rewriting it, re-delivering it.

        Pin (factual report)
         *
         |  \
         |    \
    220  |      \  520 yds to recover
    yds  |        \
         |          \
         *-----------*
       Tee          Where the AI landed
                    300 yds, wrong direction

A visible gap embarrasses you. A plausible fabrication in a delivered report is a professional liability.

Constrain the Plausibility

Everything flows from one principle: every value in the pipeline must trace to a verifiable source. Where that chain breaks, the AI fills the gap with plausible fiction. So don’t break the chain.

Each stage has a specific role. The discipline is in not overdoing it.

Field capture is deterministic. Structured inputs. Dropdowns, number fields, photo uploads. The interface collects what the engineer observes and nothing more. No generated prose. No auto-filled observations. If a field is empty, it stays empty.

Data enrichment is arithmetic. Equipment age is a subtraction. Remaining useful life is a subtraction. You can’t hallucinate 2026 - 2021 = 5.

Cost calculation is deterministic. A lookup against a vetted cost database, run through a calculator that shows its work. Every number traces to a source and a formula.

Report generation is constrained copywriting. This is where AI earns its keep. But by this point the data is so tight that the only plausible output is the factual output. The AI isn’t creating. It’s assembling. Our report agent has one hard rule: “Every sentence in the output must trace back to source text.” If the source is missing, it writes [DATA NEEDED], not a guess.

The tap-in looks effortless because every previous shot was aimed correctly.

Iteration

The AI doesn’t get it right the first time. It never will. Domain-specific technical prose that has to be factual, correctly formatted, and read like an engineer wrote it is not a one-shot problem.

But before AI, each iteration was a full cycle of human labor. Read the draft. Find the problems. Rewrite. Re-read. Find more problems. By the third revision you’re tired, you’re less careful, and you start accepting “good enough” because the cost of another pass isn’t worth it.

With AI, the cost of iteration drops to almost nothing. Generate. Review. Adjust the input. Regenerate. Five iterations in the time one used to take, and each one is as fresh as the first.

The pipeline can’t be designed for one-shot perfection. It has to be designed for fast, cheap iteration with human checkpoints. The condition skill has two review gates. The cost agent shows its work so you can adjust inputs and re-run. The report agent regenerates a section in seconds when you add a missing field.

The AI’s job isn’t to be right. It’s to be cheap to correct. Each cycle tightens the gap between plausible and factual. Not because the AI learned anything, but because the data got tighter with each pass.

That’s the real shift. Not that the AI writes the report. That you can afford to write it five times.

Late Data

This changes your relationship with new information entirely.

Before, getting the equipment list from the property manager on Thursday when the report is due Friday was a problem. The report is written. The cost table is formatted. The executive summary references specific numbers. Threading new equipment through every layer by hand takes hours.

With the pipeline, late data is a gift. Drop the equipment list into the source file. Re-run the pipeline. Cost tables recalculate. Narratives rewrite themselves. The executive summary pulls updated numbers. Ten minutes.

The friction determined how we felt about new information. When integration was manual, late data was a burden. When integration is automated, late data is better data. Same information, completely different emotional response, because the cost of incorporating it went from hours to minutes.

You stop defending the current draft and start welcoming anything that makes it more accurate. Because accuracy is free now. It used to cost a weekend.

The Point

Don’t smash the driver on every shot. AI isn’t a power tool you aim at the problem and fire. It’s one club in a bag, and the bag matters more than any single club.

Structured inputs. Deterministic calculations. Single sources of truth. A staged pipeline where each phase does its job and doesn’t try to do the next phase’s job too. Constrain the data. Constrain the plausibility. Let the AI iterate cheaply. Welcome new data because integration is free.

That’s where AI generates real value. Not by being creative. By having nothing left to create.


Written with Claude.