Stages
1. Download
When you give Augent a URL, it downloads the audio track only — never video. Usesyt-dlp with aria2c for speed (16 parallel connections, concurrent fragments). Supports YouTube, Twitter/X, TikTok, Instagram, SoundCloud, and 1000+ sites.
The source URL is stored permanently by file hash. Any future operation on that file — even weeks later, from a different path — automatically links back to the original source.
2. Separate (optional)
If the audio has music, intros, or background noise, Augent can isolate the vocals using Meta’s Demucs v4 before transcription. This dramatically improves transcription accuracy on noisy audio. See Audio Separation.3. Transcribe
The file is transcribed locally using faster-whisper (a CTranslate2-optimized build of OpenAI’s Whisper). Produces word-level timestamps, automatic language detection, and VAD filtering to skip silence. The result is stored in memory immediately — keyed by file hash + model size. See Memory & Caching.4. Search & Analyze
Once a file is in memory, every tool works on the stored transcript instantly — no re-transcription:- Keyword search: literal string matching with timestamps and context
- Semantic search: find content by meaning using sentence-transformer embeddings. See Semantic Search
- Chapters: auto-detect topic boundaries using embedding similarity
- Speaker ID: identify who said what using pyannote diarization. See Speaker Diarization
- Highlights: find the best moments automatically or by topic
- Notes: formatted notes in multiple styles
- Batch search: search dozens of files in parallel
5. Export
Results can be exported as CSV, XLSX, SRT, VTT, or JSON. Video clips can be extracted around matches viaclip_export. Notes are saved as .txt files formatted for Obsidian.

