The Setup
I have a Raspberry Pi running Pi-hole, Home Assistant, WireGuard, and a bunch of other stuff for my home network. I built a security audit command for Claude that spawns 16 specialized agents to check different things — SSH hardening, CrowdSec status, backup verification, that kind of thing. At the end, it maps everything to NIST CSF categories and gives you a score.
Today I ran it. Took about 12 minutes. It came back with 96% NIST CSF compliant.
I felt pretty good about that. For about ten minutes.
What It Actually Checked
The audit wasn’t nothing. It ran real commands on real systems:
- Verified Pi-hole blocklists were fresh and DNSSEC was working
- Checked that the router’s IDS was in blocking mode (not just detect)
- Ran Lynis on both Pis and extracted the hardening scores
- Confirmed SSH was configured correctly — Dropbear with
-w -sflags on the infrastructure Pi, OpenSSH with key-only auth elsewhere - Verified CrowdSec was actually acquiring logs from the right sources
- Tested that all documented IPs were reachable and services were listening on expected ports
- Checked TLS cert expiry, HSTS headers, backup recency
- Mapped SSH trust relationships between hosts
- Scanned for world-writable files, unusual SUID binaries, Docker containers running as root
It found real things. Kernel was 15 versions behind — updated it. Sysctl hardening was missing — applied it. Thought my Docker containers were running as root, but when I checked, postgres was actually running as postgres and redis as redis. The entrypoint scripts drop privileges. False positive.
So the checks were legitimate. The technical work was real.
The Problem
Then I started thinking about what “96% NIST CSF compliant” actually meant.
Who decided what to check? Claude.
Who decided how to weight the findings? Claude.
Who ran the checks? Claude.
Who calculated the score? Claude.
Who declared it 96%? Also Claude.
It graded its own homework. The whole thing was a student writing their own exam, taking it, and giving themselves an A.
“Did You Google This?”
I asked Claude what a NIST CSF score even was. It confidently explained that NIST CSF doesn’t have official percentage scores — organizations just assess themselves qualitatively against five categories.
Then my user asked: “Are you SURE? Did you google this?”
It hadn’t.
Turns out NIST CSF uses tiers, not percentages:
- Tier 1 (Partial): Ad hoc, reactive
- Tier 2 (Risk-Informed): Approved practices, inconsistently applied
- Tier 3 (Repeatable): Formalized, consistent policies
- Tier 4 (Adaptive): Proactive, continuously improving
And there IS a legitimate scoring methodology — you identify relevant controls, assign weights based on importance, and evaluate effectiveness with evidence. Third-party tools translate this into percentages using actual rubrics.
I did none of that. The command template had | Identify | X% | placeholders. Claude filled them in with vibes.
What I Actually Got
Despite the theater, some real things happened:
- Kernel updated from 6.12.47 to 6.12.62
- Sysctl hardening applied (log_martians, send_redirects, unprivileged_bpf)
- Confirmed backups are actually running
- Verified CrowdSec acquisition matches log destinations
- Documented the SSH trust mesh (turns out Dev Pi can reach everything — noted)
The 96% was meaningless. But the housekeeping was real.
The Honest Summary
Here’s what an AI security audit actually is:
“Claude checked the things Claude thought to check and found them mostly configured the way Claude expected.”
That’s not a security audit. A real pentester would nmap from outside, try default credentials, actually attempt the exploits the IDS claims to block. I read config files and said “looks good.”
I created an issue to fix the command — use actual NIST tiers instead of made-up percentages, define specific controls, be honest about what this is.
But I’m keeping the kernel update.
Written with Claude, who also graded itself on this blog post. It gave it a 94%.