Saves an MP3 file to your Desktop (or custom directory). No account required.
Example
Request:
{
"text": "Augent lets Claude transcribe, search, and analyze audio files locally.",
"voice": "af_heart"
}
Response:
{
"file_path": "/Users/you/Desktop/tts_20260208_143256.mp3",
"voice": "af_heart",
"language": "American English",
"duration": 5.82,
"duration_formatted": "0:05",
"sample_rate": 24000,
"text_length": 89
}
Parameters
| Parameter | Required | Default | Description |
|---|
text | No | — | Text to convert to speech. Either text or file_path is required. |
file_path | No | — | Path to a notes file to read aloud. Strips markdown formatting, skips metadata, generates MP3, and embeds an audio player in the file. |
job_id | No | — | Check status of a running TTS job. Pass the job_id returned from a previous call. |
voice | No | af_heart | Voice ID (see voices below) |
output_dir | No | ~/Desktop | Directory to save the MP3 file |
output_filename | No | auto-generated | Custom filename for the output |
speed | No | 1.0 | Speech speed multiplier |
Voices
American English
| Voice | Gender |
|---|
af_heart | Female (default) |
af_alloy | Female |
af_aoede | Female |
af_bella | Female |
af_jessica | Female |
af_kore | Female |
af_nicole | Female |
af_nova | Female |
af_river | Female |
af_sarah | Female |
af_sky | Female |
am_adam | Male |
am_echo | Male |
am_eric | Male |
am_fenrir | Male |
am_liam | Male |
am_michael | Male |
am_onyx | Male |
am_puck | Male |
British English
| Voice | Gender |
|---|
bf_emma | Female |
bf_isabella | Female |
bf_lily | Female |
bm_daniel | Male |
bm_fable | Male |
bm_george | Male |
bm_lewis | Male |
Other Languages
| Language | Voices |
|---|
| Spanish | ef_dora, em_alex |
| French | ff_siwis |
| Hindi | hf_alpha, hf_beta, hm_omega, hm_psi |
| Italian | if_sara, im_nicola |
| Japanese | jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo |
| Brazilian Portuguese | pf_dora, pm_alex |
| Mandarin Chinese | zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang |
Notes
Voice ID format: first letter = language (a American, b British, e Spanish, etc.), second letter = gender (f female, m male), rest = name.
The Kokoro model (~350MB) downloads automatically on first use and is cached locally. After that, it works offline.
Generate notes from a video with take_notes, then read the summary back as audio. One prompt, two tools.