Pipeline
Download
download_audio grabs audio from any URL using yt-dlp + aria2c (16 parallel connections, 4 concurrent fragments). Only audio is downloaded. No video, no conversion. The raw file lands in ~/Downloads by default.Separate (optional)
separate_audio runs Meta’s Demucs v4 to isolate vocals from music, background noise, and other sounds. The clean vocal stem feeds into transcription for accurate results on noisy recordings. Cached by file hash at ~/.augent/separated/.Transcribe
transcribe_audio runs the file through faster-whisper locally. The full transcript with word-level timestamps is stored in a SQLite memory at ~/.augent/memory/. A human-readable .md copy is saved to ~/.augent/memory/transcriptions/. Nothing leaves your machine.Memory
Every transcription is keyed by file hash. If you search, analyze, or re-transcribe the same file, the stored result is returned instantly. Embeddings computed by
deep_search and chapters are also stored and shared between tools.Search & Analyze
Multiple tools operate on the stored transcript:
- search_audio: exact keyword matching with timestamps and context
- deep_search: semantic (meaning-based) search using embeddings
- search_memory: search across all stored transcriptions by keyword or meaning
- search_proximity: find where two keywords appear near each other
- batch_search: run keyword search across many files in parallel
- chapters: auto-detect topic changes using embedding similarity
- identify_speakers: speaker diarization (who spoke when)
- highlights: export MP4 clips of the best moments (auto or focused by topic)
- take_notes: generate formatted notes with AI
- tag: organize transcriptions with broad topic categories
- clip_export: export video clips for specific time ranges
- text_to_speech: convert text back to spoken audio
Memory Layer
All stored data lives under~/.augent/memory/:
| Data | Storage | Shared between |
|---|---|---|
| Transcriptions | SQLite DB + .md files | All tools |
| Embeddings | SQLite DB | deep_search, chapters, search_memory, tag |
| Tags | SQLite DB | tag, Web UI Memory Explorer |
| Speaker diarization | SQLite DB | identify_speakers |
| Source URLs | SQLite DB (by file hash) | All search tools, Web UI |
| Separated stems | WAV files (~/.augent/separated/) | separate_audio |
download_audio, the CLI, or the Web UI. Any future operation on that file automatically inherits the source URL for linking back to the original content.
Use memory_stats to see how much is stored, list_memories to browse entries, clear_memory to wipe everything, or the Web UI Memory Explorer to browse and delete individual entries.
Key Design Decisions
- Local-only: no API calls, no cloud. Whisper runs on your CPU/GPU.
- Audio-only downloads: skipping video makes downloads up to 200x faster.
- No format conversion: files stay in their native format (
.webm,.m4a, etc.) to avoid slow transcoding. - Cache everything: first run is slow (transcription), every subsequent operation on that file is instant.
- Composable tools: each tool does one thing. Claude chains them together based on your prompt.

