> ## Documentation Index
> Fetch the complete documentation index at: https://docs.augent.app/llms.txt
> Use this file to discover all available pages before exploring further.

# Source Separation

> Isolate vocals from music, background noise, and other sounds. Clean audio in, clean transcription out.

Augent's transcription is powered by Whisper, which is built for clean speech. When the audio has music, background noise, podcast intros, or overlapping sounds, Whisper does its best but the results suffer. Words get missed. Sentences get mangled. Timestamps drift.

`separate_audio` fixes this. It runs Meta's Demucs v4 on the recording and isolates the vocal track from everything else. Music, drums, bass, ambient noise, all stripped out. What remains is clean speech that Whisper transcribes accurately.

***

## When to use it

* **Podcast episodes with intro/outro music.** The first 30 seconds of most podcasts are music. Whisper tries to transcribe the lyrics or hallucinates words. Separation removes the music entirely.
* **Twitter/X Spaces with background noise.** Spaces are recorded from phones in noisy environments. Separation isolates the speakers.
* **Conference talks and seminars.** Venue acoustics, audience noise, and background music between segments all degrade transcription quality.
* **Interviews recorded in public.** Coffee shops, street noise, other conversations bleeding in.
* **Any recording where someone is talking over music.** Demucs was built for exactly this. It separates the voice from the music even when they overlap completely.

***

## How it works

One tool call before transcription:

**Step 1: Separate**

```
separate_audio
  audio_path: "/path/to/noisy-podcast.mp3"
```

Returns the path to the clean vocal stem.

**Step 2: Transcribe the vocal stem**

```
transcribe_audio
  audio_path: "/path/to/.augent/separated/.../vocals.wav"
```

Clean transcription. No background noise. Accurate timestamps.

The vocal stem works with every tool in Augent: `search_audio`, `deep_search`, `chapters`, `identify_speakers`, `batch_search`, `take_notes`, and `search_proximity`.

***

## Caching

Separation results are cached by file hash at `~/.augent/separated/`. The first run processes the audio through Demucs. Every run after returns the cached stems instantly.

Same caching behavior as transcriptions. Process once, use forever.

***

## Models

| Model         | Speed  | Quality | Best for                                                    |
| ------------- | ------ | ------- | ----------------------------------------------------------- |
| `htdemucs`    | Fast   | Great   | Default. Handles most recordings well.                      |
| `htdemucs_ft` | Slower | Best    | Difficult audio with heavy overlap between voice and music. |

Stick with `htdemucs` unless the default output still has audible music bleed in the vocal stem.

***

## Vocals-only vs full separation

By default, `separate_audio` runs in **vocals-only mode**: it produces two stems (vocals and no\_vocals). This is faster than full separation and produces the same vocal quality.

Set `vocals_only: false` to get all four stems: **vocals, drums, bass, other**. This is useful if you need the individual instrument tracks for other purposes, but for transcription, vocals-only is all you need.

***

## Installation

Source separation is included in the standard Augent install:

```bash theme={null}
curl -fsSL https://augent.app/install.sh | bash
```

If you installed Augent before this feature was added, install the separator package:

```bash theme={null}
pip install augent[separator]
```

Or install demucs directly:

```bash theme={null}
pip install demucs
```