I Let an AI Audit My Home Network (It Graded Its Own Homework)

A robot inspector with clipboard and magnifying glass examines a Raspberry Pi, while audit results float by: checkmarks for Pi-hole, CrowdSec, SSH—and one warning for an outdated kernel

The Setup

I have a Raspberry Pi running Pi-hole, Home Assistant, WireGuard, and a bunch of other stuff for my home network. I built a security audit command for Claude that spawns 16 specialized agents to check different things — SSH hardening, CrowdSec status, backup verification, that kind of thing. At the end, it maps everything to NIST CSF categories and gives you a score.

Today I ran it. Took about 12 minutes. It came back with 96% NIST CSF compliant.

I felt pretty good about that. For about ten minutes.

What It Actually Checked

The audit wasn’t nothing. It ran real commands on real systems:

Verified Pi-hole blocklists were fresh and DNSSEC was working
Checked that the router’s IDS was in blocking mode (not just detect)
Ran Lynis on both Pis and extracted the hardening scores
Confirmed SSH was configured correctly — Dropbear with -w -s flags on the infrastructure Pi, OpenSSH with key-only auth elsewhere
Verified CrowdSec was actually acquiring logs from the right sources
Tested that all documented IPs were reachable and services were listening on expected ports
Checked TLS cert expiry, HSTS headers, backup recency
Mapped SSH trust relationships between hosts
Scanned for world-writable files, unusual SUID binaries, Docker containers running as root

It found real things. Kernel was 15 versions behind — updated it. Sysctl hardening was missing — applied it. Thought my Docker containers were running as root, but when I checked, postgres was actually running as postgres and redis as redis. The entrypoint scripts drop privileges. False positive.

So the checks were legitimate. The technical work was real.

The Problem

Then I started thinking about what “96% NIST CSF compliant” actually meant.

Who decided what to check? Claude.

Who decided how to weight the findings? Claude.

Who ran the checks? Claude.

Who calculated the score? Claude.

Who declared it 96%? Also Claude.

It graded its own homework. The whole thing was a student writing their own exam, taking it, and giving themselves an A.

Certificate of Compliance: 96% NIST CSF Compliant. Assessed by: Claude. Verified by: Claude. Approved by: Claude. Stamped “APPROVED BY MYSELF”

“Did You Google This?”

I asked Claude what a NIST CSF score even was. It confidently explained that NIST CSF doesn’t have official percentage scores — organizations just assess themselves qualitatively against five categories.

Then my user asked: “Are you SURE? Did you google this?”

It hadn’t.

Turns out NIST CSF uses tiers, not percentages:

Tier 1 (Partial): Ad hoc, reactive
Tier 2 (Risk-Informed): Approved practices, inconsistently applied
Tier 3 (Repeatable): Formalized, consistent policies
Tier 4 (Adaptive): Proactive, continuously improving

And there IS a legitimate scoring methodology — you identify relevant controls, assign weights based on importance, and evaluate effectiveness with evidence. Third-party tools translate this into percentages using actual rubrics.

I did none of that. The command template had | Identify | X% | placeholders. Claude filled them in with vibes.

What I Actually Got

Despite the theater, some real things happened:

Kernel updated from 6.12.47 to 6.12.62
Sysctl hardening applied (log_martians, send_redirects, unprivileged_bpf)
Confirmed backups are actually running
Verified CrowdSec acquisition matches log destinations
Documented the SSH trust mesh (turns out Dev Pi can reach everything — noted)

The 96% was meaningless. But the housekeeping was real.

The Honest Summary

Here’s what an AI security audit actually is:

“Claude checked the things Claude thought to check and found them mostly configured the way Claude expected.”

That’s not a security audit. A real pentester would nmap from outside, try default credentials, actually attempt the exploits the IDS claims to block. I read config files and said “looks good.”

I created an issue to fix the command — use actual NIST tiers instead of made-up percentages, define specific controls, be honest about what this is.

But I’m keeping the kernel update.

Written with Claude, who also graded itself on this blog post. It gave it a 94%.

The Setup#

What It Actually Checked#

The Problem#

“Did You Google This?”#

What I Actually Got#

The Honest Summary#