Table of contents
General Transcription- How does transcription work?
- Which model size should I use?
- Do I need a GPU?
- What languages are supported?
- What audio formats can I use?
- What sites can I download audio from?
- What is the difference between keyword search and deep search?
- How does proximity search work?
- Can I search multiple files at once?
- Can I search across everything I have ever transcribed?
- How does memory work?
- Where is memory stored?
- Does memory persist between sessions?
- How do I clear memory?
- Is there a storage limit?
- What operating systems are supported?
- What Python version do I need?
- What MCP clients work with Augent?
- Does Augent have a CLI?
- Does Augent have a web UI?
General
What is Augent?
Augent is a complete audio processing pipeline exposed as MCP tools. It downloads, transcribes, indexes, and searches audio and video content entirely on your machine. Give it URLs or files, get structured answers back.Who is Augent for?
Anyone whose work touches audio or video. Researchers, developers, legal teams, educators, analysts, content creators, journalists. If you need answers from content without sitting through it, Augent handles it.Is Augent free?
Yes. Augent is open source under the MIT license. No API keys, no subscriptions, no usage limits.Does anything get sent to an external server?
No. Everything runs locally. Transcription, search, embeddings — all on your machine. The only network calls Augent makes are when you ask it to download audio from a URL.Transcription
How does transcription work?
Augent uses faster-whisper, a fast local implementation of OpenAI’s Whisper model. Everything runs on your machine.Which model size should I use?
tiny is the default and handles almost everything: tutorials, interviews, lectures, podcasts, even audio with background music. Use small or above for heavy accents, very poor audio quality, or song lyrics.
| Model | Speed | Accuracy |
|---|---|---|
| tiny | Fastest | Excellent (default) |
| base | Fast | Excellent |
| small | Medium | Superior |
| medium | Slow | Outstanding |
| large | Slowest | Maximum |
Do I need a GPU?
No. Augent runs on CPU by default. If you have a CUDA-compatible GPU, it will use it automatically for faster transcription.What languages are supported?
Whisper supports 99+ languages. Augent auto-detects the language from the audio.What audio formats can I use?
MP3, WAV, M4A, FLAC, OGG, WebM, and any other format FFmpeg can handle.What sites can I download audio from?
1,000+ sites. YouTube, Vimeo, TikTok, Twitter/X, SoundCloud, Twitch, and anything else yt-dlp supports.Search
What is the difference between keyword search and deep search?
Keyword search finds exact word matches. Fast and precise. Deep search finds matches by meaning. It uses embeddings to understand what was said, even when the exact words don’t match your query.How does proximity search work?
It finds where two keywords appear near each other. For example, “pricing” near “competitor” returns only the moments where both concepts come up together.Can I search multiple files at once?
Yes.batch_search searches multiple audio files in parallel. No file limit.
Can I search across everything I have ever transcribed?
Yes.search_memory queries all your stored transcriptions at once. No file path needed, no limit on how many files it searches.
Memory
How does memory work?
Every transcription is stored by file hash in a local SQLite database. The first time you process a file, it transcribes. Every time after that, results are instant.Where is memory stored?
~/.augent/memory/. Transcriptions in transcriptions.db, markdown copies in transcriptions/.
Does memory persist between sessions?
Yes. Memory is permanent until you clear it.How do I clear memory?
Use theclear_memory tool or run augent memory clear from the CLI.
Is there a storage limit?
No. Memory grows as you transcribe. A typical transcription takes a few KB.Privacy and security
Is my data private?
Yes. Audio stays local, transcriptions stay local, search stays local. Nothing leaves your device. The only network activity is when you ask Augent to download audio from a URL you provide.Can I use Augent fully offline?
Yes, for everything except downloading new audio from URLs. Transcription, search, and all analysis tools work without an internet connection.Performance
How fast is transcription?
With thetiny model on a modern machine, Augent transcribes faster than real-time. A 1-hour file typically takes a few minutes.
Why is the first search slow but after that it is instant?
The first search triggers transcription if the file hasn’t been processed before. Once it is in memory, every search after that queries the stored transcript instantly.Configuration
Can I customize default settings?
Yes. Create~/.augent/config.yaml to set defaults for model size, output directories, clip padding, TTS voice, and more. Per-call arguments always override config values. No config file is required — all values have sensible defaults. See Configuration.
Can I hide tools I don’t need?
Yes. Add tool names todisabled_tools in your config file. They are removed from the tool list entirely and cannot be called. See Configuration.
Compatibility
What operating systems are supported?
macOS and Linux natively. Windows via WSL2 or pip install.What Python version do I need?
Python 3.10 or above.What MCP clients work with Augent?
Any MCP client. Claude Code, Codex, and OpenClaw are tested and documented. Any other MCP-compatible client works the same way.Does Augent have a CLI?
Yes. Full CLI for terminal workflows. Runaugent --help to see all commands.
Does Augent have a web UI?
Yes. Runaugent-web and open http://127.0.0.1:8282. Runs 100% locally.
