Uses Meta’s HTDemucs model to split audio into individual stems. The vocal stem feeds directly into any other Augent tool for dramatically cleaner results on noisy recordings.
Requires: pip install augent[separator] (included in augent[all])
Example
Request:
{
"audio_path": "/Users/you/Downloads/podcast-with-intro-music.mp3",
"vocals_only": true
}
Response:
{
"stems": {
"vocals": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
"no_vocals": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/no_vocals.wav"
},
"vocals_path": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
"model": "htdemucs",
"source_file": "/Users/you/Downloads/podcast-with-intro-music.mp3",
"cached": false,
"output_dir": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals",
"hint": "Use the vocals_path as the audio_path in transcribe_audio, search_audio, deep_search, or any other tool for clean results."
}
Then transcribe the clean vocals:
{
"audio_path": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
"model_size": "tiny"
}
Parameters
| Parameter | Required | Default | Description |
|---|
audio_path | Yes | | Path to the audio/video file |
vocals_only | No | true | If true, separates into vocals + no_vocals (faster). If false, separates into all 4 stems: vocals, drums, bass, other. |
model | No | htdemucs | Demucs model. htdemucs is fast with great quality. htdemucs_ft is fine-tuned for best quality but slower. |
Full 4-Stem Separation
Set vocals_only to false to get all four stems:
{
"audio_path": "/Users/you/Downloads/song.mp3",
"vocals_only": false
}
Response:
{
"stems": {
"vocals": "/path/to/vocals.wav",
"drums": "/path/to/drums.wav",
"bass": "/path/to/bass.wav",
"other": "/path/to/other.wav"
},
"vocals_path": "/path/to/vocals.wav",
"model": "htdemucs",
"cached": false
}
Notes
Results are cached by file hash. The first run separates the audio. Every run after is instant.
Use vocals_only: true (the default) when your goal is transcription. It is faster than full 4-stem separation and produces the same vocal quality.
Separated stems are stored at ~/.augent/separated/. Each file gets its own directory named by hash, so the same file is never processed twice.
The vocals_path from the response can be used as audio_path in any Augent tool: transcribe_audio, search_audio, deep_search, chapters, identify_speakers, batch_search, and more.
For best quality on difficult audio (heavy overlapping voices and music), use model: "htdemucs_ft". It is slower but produces cleaner separation.