separate_audio

Uses Meta’s HTDemucs model to split audio into individual stems. The vocal stem feeds directly into any other Augent tool for dramatically cleaner results on noisy recordings. Requires: pip install augent[separator] (included in augent[all])

Example

Request:

{
  "audio_path": "/Users/you/Downloads/podcast-with-intro-music.mp3",
  "vocals_only": true
}

Response:

{
  "stems": {
    "vocals": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
    "no_vocals": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/no_vocals.wav"
  },
  "vocals_path": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
  "model": "htdemucs",
  "source_file": "/Users/you/Downloads/podcast-with-intro-music.mp3",
  "cached": false,
  "output_dir": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals",
  "hint": "Use the vocals_path as the audio_path in transcribe_audio, search_audio, deep_search, or any other tool for clean results."
}

Then transcribe the clean vocals:

{
  "audio_path": "/Users/you/.augent/separated/a1b2c3d4_htdemucs_vocals/vocals.wav",
  "model_size": "tiny"
}

Parameters

Parameter	Required	Default	Description
`audio_path`	Yes		Path to the audio/video file
`vocals_only`	No	`true`	If true, separates into vocals + no_vocals (faster). If false, separates into all 4 stems: vocals, drums, bass, other.
`model`	No	`htdemucs`	Demucs model. `htdemucs` is fast with great quality. `htdemucs_ft` is fine-tuned for best quality but slower.

Full 4-Stem Separation

Set vocals_only to false to get all four stems:

{
  "audio_path": "/Users/you/Downloads/song.mp3",
  "vocals_only": false
}

Response:

{
  "stems": {
    "vocals": "/path/to/vocals.wav",
    "drums": "/path/to/drums.wav",
    "bass": "/path/to/bass.wav",
    "other": "/path/to/other.wav"
  },
  "vocals_path": "/path/to/vocals.wav",
  "model": "htdemucs",
  "cached": false
}

Notes

Results are cached by file hash. The first run separates the audio. Every run after is instant.

Use vocals_only: true (the default) when your goal is transcription. It is faster than full 4-stem separation and produces the same vocal quality.

Separated stems are stored at ~/.augent/separated/. Each file gets its own directory named by hash, so the same file is never processed twice.

The vocals_path from the response can be used as audio_path in any Augent tool: transcribe_audio, search_audio, deep_search, chapters, identify_speakers, batch_search, and more.

For best quality on difficult audio (heavy overlapping voices and music), use model: "htdemucs_ft". It is slower but produces cleaner separation.

​Example

​Parameters

​Full 4-Stem Separation

​Notes

Example

Parameters

Full 4-Stem Separation

Notes