text_to_speech

Saves an MP3 file to your Desktop (or custom directory). No account required.

Example

Request:

{
  "text": "Augent lets Claude transcribe, search, and analyze audio files locally.",
  "voice": "af_heart"
}

Response:

{
  "file_path": "/Users/you/Desktop/tts_20260208_143256.mp3",
  "voice": "af_heart",
  "language": "American English",
  "duration": 5.82,
  "duration_formatted": "0:05",
  "sample_rate": 24000,
  "text_length": 89
}

Parameters

Parameter	Required	Default	Description
`text`	No	—	Text to convert to speech. Either `text` or `file_path` is required.
`file_path`	No	—	Path to a notes file to read aloud. Strips markdown formatting, skips metadata, generates MP3, and embeds an audio player in the file.
`job_id`	No	—	Check status of a running TTS job. Pass the `job_id` returned from a previous call.
`voice`	No	`af_heart`	Voice ID (see voices below)
`output_dir`	No	`~/Desktop`	Directory to save the MP3 file
`output_filename`	No	auto-generated	Custom filename for the output
`speed`	No	`1.0`	Speech speed multiplier

Voices

American English

Voice	Gender
`af_heart`	Female (default)
`af_alloy`	Female
`af_aoede`	Female
`af_bella`	Female
`af_jessica`	Female
`af_kore`	Female
`af_nicole`	Female
`af_nova`	Female
`af_river`	Female
`af_sarah`	Female
`af_sky`	Female
`am_adam`	Male
`am_echo`	Male
`am_eric`	Male
`am_fenrir`	Male
`am_liam`	Male
`am_michael`	Male
`am_onyx`	Male
`am_puck`	Male

British English

Voice	Gender
`bf_emma`	Female
`bf_isabella`	Female
`bf_lily`	Female
`bm_daniel`	Male
`bm_fable`	Male
`bm_george`	Male
`bm_lewis`	Male

Other Languages

Language	Voices
Spanish	`ef_dora`, `em_alex`
French	`ff_siwis`
Hindi	`hf_alpha`, `hf_beta`, `hm_omega`, `hm_psi`
Italian	`if_sara`, `im_nicola`
Japanese	`jf_alpha`, `jf_gongitsune`, `jf_nezumi`, `jf_tebukuro`, `jm_kumo`
Brazilian Portuguese	`pf_dora`, `pm_alex`
Mandarin Chinese	`zf_xiaobei`, `zf_xiaoni`, `zf_xiaoxiao`, `zf_xiaoyi`, `zm_yunjian`, `zm_yunxi`, `zm_yunxia`, `zm_yunyang`

Notes

Voice ID format: first letter = language (a American, b British, e Spanish, etc.), second letter = gender (f female, m male), rest = name.

The Kokoro model (~350MB) downloads automatically on first use and is cached locally. After that, it works offline.

Generate notes from a video with take_notes, then read the summary back as audio. One prompt, two tools.

​Example

​Parameters

​Voices

​American English

​British English

​Other Languages

​Notes

Example

Parameters

Voices

American English

British English

Other Languages

Notes