Sergey Kopanev: you sleep — agents ship

Go Back
No Bullshit Pipeline · Part 3

Files Beat Databases for Personal Tools


Every tutorial reaches the same moment.

You’re building a tool that stores data. The tutorial says: “Now let’s set up the database.”

SQLite. PostgreSQL. MongoDB. Doesn’t matter which one. The assumption is the same.

Data lives in a database.

I didn’t do that.

What I Did Instead

Every recording is a folder.

~/nbp-data/8f3d2a10-c4b9-4e2f-9a1c-5d8f9e3c2b4a/
├── raw_mic.ogg
├── raw_system.ogg
├── audio_mix.ogg
├── metadata.json
└── transcript.md

That’s it.

No database. No schema. No migrations. No connection pool.

A UUID folder. Files inside. Done.

Why Not SQLite

SQLite is excellent. I’m not arguing against SQLite.

I’m arguing against SQLite for this kind of tool.

Here’s what you get with SQLite for a recording app:

A .db file somewhere in ~/Library/Application Support/. Binary format. Can’t open it in a text editor. Can’t grep it. Can’t rsync specific recordings to a USB drive without writing export logic. Can’t inspect what’s inside without a DB browser or custom queries.

Your data is technically yours. You still need the app to touch it.

The Grep Test

With files:

grep -r "standup meeting" ~/nbp-data/*/transcript.md

Found. Instantly.

With a database, you need:

  • App running
  • Query interface open
  • Or a custom CLI you had to build

I built NBP to own my data. Ownership means I can touch it without the app.

The Backup Test

rsync -av ~/nbp-data/ /Volumes/ExternalDrive/nbp-backup/

Done. Every recording. All files. Complete.

Restore? Same command in reverse.

With a database, you write backup logic. You test restore logic. You hope the backup isn’t corrupted. You discover it’s corrupted when you need it most.

The Portability Test

I want to open a recording from six months ago on a machine that has never seen NBP.

With files: copy the folder, open audio_mix.ogg in VLC, open transcript.md in any text editor.

With a database: export from old machine, import to new machine, pray the schema hasn’t changed, discover migration broke three fields.

The Debugging Test

Something went wrong. A transcript is empty. The mix is corrupted.

With files:

ls -lh ~/nbp-data/8f3d2a10/

File sizes. Timestamps. Which file is zero bytes.

Found the problem in five seconds.

With a database: open DB browser, write SELECT query, check table rows, cross-reference with logs.

The Performance Question

“Files don’t scale.”

I have 500 recordings.

Indexing all of them from scratch:

time find ~/nbp-data -name "metadata.json" | xargs grep -l "2026-01"

48ms.

No database warmup. No query optimizer. No connection overhead.

Just the filesystem doing what it’s been doing since 1970.

What Files Can’t Do

I’m not pretending files are perfect.

Full-text search across all transcripts is slow past a few thousand files.

Atomic multi-file updates are harder — if the app crashes mid-write, you get partial data.

Querying by duration + date + tag requires scanning every metadata.json.

For a personal tool with hundreds of recordings, none of this matters.

If I’m ever at 100,000 recordings, I’ll reconsider.

The Real Rule

Use a database when you need database features: transactions, joins, complex queries, concurrent writers.

Use files when you need ownership, portability, and the ability to grep your own data at 2am without launching an app.

Personal tools are personal.

Your data should work without your software.


Next: Running Whisper locally — offline transcription that actually works..