Architecture - Augent

Augent is a pipeline. Each stage does one thing, stores the result in memory, and passes it forward.

Pipeline

Download

download_audio grabs audio from any URL using yt-dlp + aria2c (16 parallel connections, 4 concurrent fragments). Only audio is downloaded. No video, no conversion. The raw file lands in ~/Downloads by default.

Separate (optional)

separate_audio runs Meta’s Demucs v4 to isolate vocals from music, background noise, and other sounds. The clean vocal stem feeds into transcription for accurate results on noisy recordings. Cached by file hash at ~/.augent/separated/.

Transcribe

transcribe_audio runs the file through faster-whisper locally. The full transcript with word-level timestamps is stored in a SQLite memory at ~/.augent/memory/. A human-readable .md copy is saved to ~/.augent/memory/transcriptions/. Nothing leaves your machine.

Memory

Every transcription is keyed by file hash. If you search, analyze, or re-transcribe the same file, the stored result is returned instantly. Embeddings computed by deep_search and chapters are also stored and shared between tools.

Search & Analyze

Multiple tools operate on the stored transcript:

search_audio: exact keyword matching with timestamps and context
deep_search: semantic (meaning-based) search using embeddings
search_memory: search across all stored transcriptions by keyword or meaning
search_proximity: find where two keywords appear near each other
batch_search: run keyword search across many files in parallel
chapters: auto-detect topic changes using embedding similarity
identify_speakers: speaker diarization (who spoke when)
highlights: export MP4 clips of the best moments (auto or focused by topic)
take_notes: generate formatted notes with AI
tag: organize transcriptions with broad topic categories
clip_export: export video clips for specific time ranges
text_to_speech: convert text back to spoken audio

Memory Layer

All stored data lives under ~/.augent/memory/:

Data	Storage	Shared between
Transcriptions	SQLite DB + `.md` files	All tools
Embeddings	SQLite DB	`deep_search`, `chapters`, `search_memory`, `tag`
Tags	SQLite DB	`tag`, Web UI Memory Explorer
Speaker diarization	SQLite DB	`identify_speakers`
Source URLs	SQLite DB (by file hash)	All search tools, Web UI
Separated stems	WAV files (`~/.augent/separated/`)	`separate_audio`

Source URLs from any platform (YouTube, Twitter/X, TikTok, Instagram, SoundCloud, and 1000+ sites) are stored permanently by audio file hash when downloaded via download_audio, the CLI, or the Web UI. Any future operation on that file automatically inherits the source URL for linking back to the original content. Use memory_stats to see how much is stored, list_memories to browse entries, clear_memory to wipe everything, or the Web UI Memory Explorer to browse and delete individual entries.

Key Design Decisions

Local-only: no API calls, no cloud. Whisper runs on your CPU/GPU.
Audio-only downloads: skipping video makes downloads up to 200x faster.
No format conversion: files stay in their native format (.webm, .m4a, etc.) to avoid slow transcoding.
Cache everything: first run is slow (transcription), every subsequent operation on that file is instant.
Composable tools: each tool does one thing. Claude chains them together based on your prompt.

​Pipeline

​Memory Layer

​Key Design Decisions

Pipeline

Memory Layer

Key Design Decisions