Skip to main content
Augent is a pipeline. Each stage does one thing, stores the result in memory, and passes it forward.

Pipeline

1

Download

download_audio grabs audio from any URL using yt-dlp + aria2c (16 parallel connections, 4 concurrent fragments). Only audio is downloaded. No video, no conversion. The raw file lands in ~/Downloads by default.
2

Separate (optional)

separate_audio runs Meta’s Demucs v4 to isolate vocals from music, background noise, and other sounds. The clean vocal stem feeds into transcription for accurate results on noisy recordings. Cached by file hash at ~/.augent/separated/.
3

Transcribe

transcribe_audio runs the file through faster-whisper locally. The full transcript with word-level timestamps is stored in a SQLite memory at ~/.augent/memory/. A human-readable .md copy is saved to ~/.augent/memory/transcriptions/. Nothing leaves your machine.
4

Memory

Every transcription is keyed by file hash. If you search, analyze, or re-transcribe the same file, the stored result is returned instantly. Embeddings computed by deep_search and chapters are also stored and shared between tools.
5

Search & Analyze

Multiple tools operate on the stored transcript:
  • search_audio: exact keyword matching with timestamps and context
  • deep_search: semantic (meaning-based) search using embeddings
  • search_memory: search across all stored transcriptions by keyword or meaning
  • search_proximity: find where two keywords appear near each other
  • batch_search: run keyword search across many files in parallel
  • chapters: auto-detect topic changes using embedding similarity
  • identify_speakers: speaker diarization (who spoke when)
  • highlights: export MP4 clips of the best moments (auto or focused by topic)
  • take_notes: generate formatted notes with AI
  • tag: organize transcriptions with broad topic categories
  • clip_export: export video clips for specific time ranges
  • text_to_speech: convert text back to spoken audio

Memory Layer

All stored data lives under ~/.augent/memory/:
DataStorageShared between
TranscriptionsSQLite DB + .md filesAll tools
EmbeddingsSQLite DBdeep_search, chapters, search_memory, tag
TagsSQLite DBtag, Web UI Memory Explorer
Speaker diarizationSQLite DBidentify_speakers
Source URLsSQLite DB (by file hash)All search tools, Web UI
Separated stemsWAV files (~/.augent/separated/)separate_audio
Source URLs from any platform (YouTube, Twitter/X, TikTok, Instagram, SoundCloud, and 1000+ sites) are stored permanently by audio file hash when downloaded via download_audio, the CLI, or the Web UI. Any future operation on that file automatically inherits the source URL for linking back to the original content. Use memory_stats to see how much is stored, list_memories to browse entries, clear_memory to wipe everything, or the Web UI Memory Explorer to browse and delete individual entries.

Key Design Decisions

  • Local-only: no API calls, no cloud. Whisper runs on your CPU/GPU.
  • Audio-only downloads: skipping video makes downloads up to 200x faster.
  • No format conversion: files stay in their native format (.webm, .m4a, etc.) to avoid slow transcoding.
  • Cache everything: first run is slow (transcription), every subsequent operation on that file is instant.
  • Composable tools: each tool does one thing. Claude chains them together based on your prompt.