Skip to main content
Uses Meta’s HTDemucs model to split audio into individual stems. The vocal stem feeds directly into any other Augent tool for dramatically cleaner results on noisy recordings. Requires: pip install augent[separator] (included in augent[all])

Example

Request:
{
  "audio_path": "/Users/you/Downloads/podcast-with-intro-music.mp3",
  "vocals_only": true
}
Response:
{
  "stems": {
    "vocals": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
    "no_vocals": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/no_vocals.wav"
  },
  "vocals_path": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
  "model": "htdemucs",
  "source_file": "/Users/you/Downloads/podcast-with-intro-music.mp3",
  "cached": false,
  "output_dir": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals",
  "hint": "Use the vocals_path as the audio_path in transcribe_audio, search_audio, deep_search, or any other tool for clean results."
}
Then transcribe the clean vocals:
{
  "audio_path": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
  "model_size": "tiny"
}

Parameters

ParameterRequiredDefaultDescription
audio_pathYesPath to the audio/video file
vocals_onlyNotrueIf true, separates into vocals + no_vocals (faster). If false, separates into all 4 stems: vocals, drums, bass, other.
modelNohtdemucsDemucs model. htdemucs is fast with great quality. htdemucs_ft is fine-tuned for best quality but slower.

Full 4-Stem Separation

Set vocals_only to false to get all four stems:
{
  "audio_path": "/Users/you/Downloads/song.mp3",
  "vocals_only": false
}
Response:
{
  "stems": {
    "vocals": "/path/to/vocals.wav",
    "drums": "/path/to/drums.wav",
    "bass": "/path/to/bass.wav",
    "other": "/path/to/other.wav"
  },
  "vocals_path": "/path/to/vocals.wav",
  "model": "htdemucs",
  "cached": false
}

Notes

Results are cached by file hash. The first run separates the audio. Every run after is instant.
Use vocals_only: true (the default) when your goal is transcription. It is faster than full 4-stem separation and produces the same vocal quality.
Separated stems are stored at ~/.augent/separated/. Each file gets its own directory named by hash, so the same file is never processed twice.
The vocals_path from the response can be used as audio_path in any Augent tool: transcribe_audio, search_audio, deep_search, chapters, identify_speakers, batch_search, and more.
For best quality on difficult audio (heavy overlapping voices and music), use model: "htdemucs_ft". It is slower but produces cleaner separation.