2025-10-24 - Discovering Mistral 7B on Mac Mini M4 with Apple Silicon Monitoring

🎯 Key Insights

  • Undocumented Infrastructure: Ollama + Mistral 7B installed 4 days prior but completely missing from Memory Bank documentation
  • Apple Silicon Monitoring Gap: btop GPU monitoring is Linux-only; asitop is the correct tool for M4 GPU/Neural Engine observation
  • M4 Efficiency Characteristics: Inference so fast (40-80 tokens/sec) that GPU activity appears as brief power blips rather than sustained load

πŸ“ What Happened

Session started with a simple question: “Did we install Mistral on here?” Memory Bank had no record of it, but systematic search revealed Ollama v0.12.6 with Mistral 7B fully operational, installed approximately October 20th.

Initial searches in PATH and Homebrew came up empty, but ollama list revealed the 4.4 GB model. Testing confirmed full functionality - simple queries returned instant responses, and a 500-word networking essay generated in ~30 seconds with accurate technical details about ARPANET, TCP/IP, and DNS history.

The discovery exposed a monitoring gap: btop was configured with gpu0 for GPU monitoring, but research into the btop GitHub repo revealed GPU support is Linux-only (NVIDIA/AMD/Intel). For Apple Silicon, asitop is the proper tool - it uses powermetrics to show GPU utilization, E/P-Cluster CPU, Neural Engine activity, and power consumption.

Installed asitop v0.0.24 via Homebrew and configured optimal settings (--interval 1 --avg 3 for 1-second updates with 3-second rolling average). During long inference tests, GPU activity became visible as brief power spikes in asitop - the M4 is so efficient that it bursts to 100% GPU, generates tokens, then returns to idle in under a second.

Working in: /Users/chris/2_project-files/projects/active-projects/chungus-net/

πŸ”§ Technical Details

Ollama Installation Discovered

  • Version: v0.12.6 via Homebrew
  • Model: Mistral 7B (4.4 GB, ID: 6577803aa9a0)
  • Location: ~/.ollama/models/blobs/
  • Service: Running since Sunday 10 PM (PID 58472)
  • Performance: 40-80 tokens/sec, 100% GPU utilization on M4

Resource Usage Pattern

# Ollama processes
PID: 58472 | CPU: 0.0% | MEM: 0.2% | RSS: 35 MB      # Server daemon
PID: 67771 | CPU: 0.0% | MEM: 27.0% | RSS: 4.4 GB    # Model runner

# During inference
ollama ps
# NAME       ID           SIZE    PROCESSOR    CONTEXT    UNTIL
# mistral:7b 6577803aa9  5.0 GB  100% GPU     4096       4 minutes

asitop Configuration

# Optimal command for M4 GPU monitoring
sudo asitop --interval 1 --avg 3

# Shows: GPU %, E/P-Cluster CPU, ANE (Neural Engine), Power, Memory bandwidth

Test Results

# Simple validation
echo "What is 2+2?" | ollama run mistral:7b
# Output: "The sum of 2 and 2 is 4."

# Long inference test (500-word essay)
ollama run mistral:7b "Write a detailed 500-word essay about computer networking history"
# Generated accurate technical content in ~30-40 seconds

βš™οΈ Technology Stack Analysis

Primary Technologies

  • LLM Platform: Ollama v0.12.6 (local model serving)
  • Model: Mistral 7B (open-source LLM optimized for efficiency)
  • Acceleration: Apple Metal GPU acceleration on M4 chip
  • Monitoring: asitop v0.0.24 for Apple Silicon performance metrics

System Monitoring Tools

  • btop: v1.4.5 - CPU/memory/network only (GPU monitoring unsupported on macOS)
  • asitop: v0.0.24 - Apple Silicon-specific (GPU, ANE, power, clusters)
  • Command Line: ollama ps, ollama list for model management

Technology Integration

  • Unified Memory Architecture: 5 GB model in shared RAM/VRAM, GPU accesses directly via Metal
  • Zero CPU Overhead: Inference runs 100% on GPU, CPU usage near-zero even during generation
  • Efficient Resource Usage: 27% RAM (4.5 GB of 16 GB), instant idle recovery after inference

πŸ’­ Reflections

What Worked Well

  • Systematic Search: Started with obvious locations (PATH, Homebrew), expanded to ollama list when initial searches failed
  • Documentation Research: Checked btop GitHub repo to understand GPU monitoring limitations rather than assuming configuration error
  • Tool Validation: Tested multiple inference scenarios to confirm functionality and observe resource patterns

Learning & Growth

Discovered that Apple Silicon monitoring requires platform-specific tools - btop’s GPU support is Linux-only, a limitation I initially missed. The M4’s efficiency creates a monitoring challenge: inference happens so fast that GPU activity appears as brief blips rather than sustained load. Understanding that this is normal behavior (not a configuration issue) required checking asitop’s power graphs during active inference.

Process Insights

Memory Bank documentation gaps became apparent - a significant tool installation 4 days ago had zero documentation. This reinforces the importance of updating tech-context.md immediately after installations rather than assuming future sessions will remember.

The btop configuration attempt (adding gpu0 to shown_boxes) was wasted effort due to platform limitations. Checking platform compatibility in documentation BEFORE configuring features would have saved time.

πŸ”„ Patterns Noticed

Tool Discovery Pattern

When tools aren’t in PATH or Homebrew listings, check specialized package managers (ollama list, docker images, pip list) before concluding they’re not installed.

Apple Silicon Monitoring

M4 GPU efficiency means traditional monitoring shows minimal activity - power consumption graphs are more reliable indicators than utilization percentages for brief ML inference workloads.

Documentation Discipline

Installations and system changes need immediate Memory Bank updates. Relying on memory across sessions doesn’t work when context resets completely.


Session Metadata:

  • Duration: ~45min | Project: chungus-net | Type: Discovery + Validation
  • Progress: Tool discovery complete, monitoring configured | Achievement Score: 3.5/5 | Productivity: High
  • Technologies: Ollama, Mistral 7B, asitop, btop | Claude Tools: Read(7), Bash(15), WebFetch(2)
  • Focus Quality: Excellent | Learning: Apple Silicon monitoring, M4 efficiency patterns | Next: Document in Memory Bank