Whisper on Device: AI Without the Cloud
The recording finished. 47 minutes of meeting audio.
I clicked transcribe.
Nothing left the machine.
That was the whole point of NBP from day one. Capture and transcribe without handing meetings to a cloud API.
What NBP Is
NBP (No Bullshit Pipeline) is a local audio recording tool for macOS. It records Zoom calls — microphone and system audio simultaneously. It transcribes them on-device. It stores everything as plain files in ~/nbp-data/. No cloud. No servers. No accounts.
This is part four of the series. I’ve covered privacy architecture, audio capture, and file storage. Now: how the transcription actually works when there’s no API to call.
OpenAI Made a Local Model
OpenAI released Whisper in 2022. The model itself. The weights. The code.
You can run it on your own hardware.
That’s the part most people miss. They see “OpenAI Whisper” and assume API. There’s no API requirement. The model runs locally. The audio never leaves.
I use whisper.cpp — a C++ port that runs fast on Apple Silicon. No Python dependency hell. No virtual environments. Just a compiled binary and a model file.
The Model Files
This is where it gets real. Models aren’t small.
tiny — 75 MB — rough, fast, good for notes
base — 145 MB — decent, still fast
small — 465 MB — good quality, reasonable speed
medium — 1.5 GB — excellent quality, slower
large — 2.9 GB — best quality, slow on CPU
Pick one. Download it once. It lives on your disk.
NBP defaults to base. It’s fast enough and accurate enough for most meetings. You can switch to small or medium in settings. The download happens once and never again.
One-Click Download
I didn’t want to explain model management to users.
You open settings. You see a model selector. You pick one. You click “Download.”
That’s it.
The app fetches from Hugging Face. Progress bar. Done. The file lands in ~/.nbp/models/. Your disk. Your files. No login. No subscription.
~/.nbp/models/
├── ggml-base.bin # 145 MB
└── ggml-small.bin # 465 MB
Delete them anytime. Re-download anytime. No registration required.
Privacy By Physics
The cloud Whisper API is fast. ~30 seconds for a 1-hour file.
Local Whisper on M1 takes 4-6 minutes for the same file.
That’s the real tradeoff. Not quality. Speed.
The quality is identical. The model weights are the same. Local inference on Apple Silicon sounds the same as OpenAI’s servers. They run the same model.
The difference is where the computation happens.
Cloud: your audio file travels to OpenAI’s servers. A machine you don’t own processes it. The results come back.
Local: your audio file stays in ~/nbp-data/. A model file you downloaded processes it. Nothing travels anywhere.
This isn’t a privacy policy. It’s physics.
Airplane Mode Test
Land at JFK. Pull out your laptop.
Open NBP. Load a recording from yesterday’s meeting. Click transcribe.
It works.
No WiFi. No hotspot. No signal. The model is on your disk. The inference runs on your CPU. The transcript appears.
You don’t get a “feature unavailable offline” message. You don’t get a spinner that never resolves. You don’t get rate limited.
You get a transcript.
The Failure Mode
I’ll be honest. Local Whisper has one problem.
Long recordings on large model are slow.
Three hours of audio on large takes around 45 minutes to process. On an M1 MacBook Pro. That’s the reality.
For most use cases — 30-60 minute meetings — base or small is fine. Sub-10 minutes. Accurate enough.
For all-day recordings, you make a choice. Queue it overnight. Or accept the speed tradeoff of base.
The cloud API is faster. That’s the honest answer.
The Choice Architecture
NBP defaults to local.
You open the app. You record. You transcribe. Everything is local. Nothing leaves.
If you want speed — or you’re processing a three-hour session and don’t want to wait — you add an OpenAI API key in settings. One field. One key. Done.
With the key present, NBP routes to the API. Faster. Costs a few cents per hour of audio.
Delete the key. It falls back to local. Immediately. No degraded experience.
You own the default. Cloud is the exception, not the assumption.
The Numbers That Matter
basemodel: 145 MB disk, ~3 min/hour of audio on M1smallmodel: 465 MB disk, ~6 min/hour of audio on M1mediummodel: 1.5 GB disk, ~12 min/hour of audio on M1- Cloud API: ~200 MB upload, ~30 sec/hour of audio, ~$0.006/minute
For a 45-minute daily standup: local base model takes about 2 minutes. That’s acceptable. That’s the default.
Offline AI Isn’t a Gimmick
The phrase “on-device AI” got hijacked by marketing.
Apple uses it to sell chips. Microsoft uses it to justify Copilot requirements. Every hardware vendor has an “AI PC” announcement.
For NBP, it means one concrete thing: the model runs whether or not you have internet.
That’s all it means. No philosophical statements. No manifesto.
The model file is 145 MB. It’s on your disk. The binary runs. The transcript appears.
The internet is optional. That’s worth something.