Skip to main content
Before you can transcribe, search, or analyze anything, you need the audio on your machine. Most tools treat downloading as an afterthought — shell out to a downloader, wait, hope it works. Fine for one file. For a hundred, it’s a bottleneck. Standard video downloaders pull the full video stream, gigabytes of visual data you’ll never use, then transcode it into a different format, burning more time. A 2-hour YouTube video is 4 GB as video. The same content is 30 MB as audio. Downloading the video and converting is hundreds of times slower than grabbing the audio directly.

Built for speed

download_audio exists to get audio onto your machine as fast as your connection allows. Every design choice optimizes for speed. 16 parallel connections. aria2c opens 16 simultaneous connections to the source server, saturating your bandwidth instead of downloading through a single stream. 4 concurrent fragments. Large files are split into fragments downloaded simultaneously, then reassembled locally. The download finishes in a fraction of the time a single-stream download would take. Audio-only extraction. No video is ever downloaded. yt-dlp extracts only the audio stream directly from the source. For a typical YouTube video, this means downloading 30 MB instead of 4 GB, up to 200x less data. No format conversion. The audio stays in its native format: .webm from YouTube, .m4a from SoundCloud, .mp4 from Twitter. Skipping transcoding saves minutes per file and avoids quality loss. faster-whisper handles every format natively, so there’s no reason to convert.

1000+ sites

Every site that yt-dlp supports, Augent supports. YouTube, Vimeo, TikTok, Twitter/X, SoundCloud, Twitch, Dailymotion, Spotify podcasts, Reddit, Instagram, Facebook, BBC, CNN, and hundreds more. If it has audio, download_audio grabs it. Real workflows span multiple platforms. Competitive research pulls from YouTube demos, Twitter Spaces, and podcast platforms. Content repurposing spans TikTok, Instagram, and long-form video. Due diligence means scanning earnings calls, interviews, and conference talks hosted across dozens of sites. One tool handles all of it.

The first stage of most workflows

Whether you’re pulling content from the web or working with files you already have on disk, download_audio is how new content enters the pipeline. It feeds directly into every other tool in the suite:
  • Download → Transcribe → Search: The standard pipeline. Audio lands on disk, gets transcribed into memory, becomes instantly searchable.
  • Download → Take Notes: take_notes calls download_audio internally. One URL in, formatted notes out.
  • Download → Batch Search: Download a library of files, then batch_search processes them all in parallel.
  • Download → Identify Speakers: Pull a meeting recording or interview, immediately diarize who said what.
The file path returned by download_audio is ready to pass directly to any downstream tool. No renaming, no moving, no conversion steps in between.

Scale without limits

No file limit, no queue, no throttling. Download one file or a hundred — each one at full speed. Combine with batch_search to process entire podcast libraries, full conference lineups, or months of recordings in a single prompt. Every downloaded file that gets transcribed is stored in memory. Process a library once, search it forever.

Tool Reference

Parameters, response format, and technical details