> ## Documentation Index
> Fetch the complete documentation index at: https://docs.augent.app/llms.txt
> Use this file to discover all available pages before exploring further.

# transcribe_audio

> Transcribe an audio file and return the full text with timestamps. Results are stored in memory automatically.

## Model Sizes

| Model    | Speed   | Accuracy            |
| -------- | ------- | ------------------- |
| **tiny** | Fastest | Excellent (default) |
| base     | Fast    | Excellent           |
| small    | Medium  | Superior            |
| medium   | Slow    | Outstanding         |
| large    | Slowest | Maximum             |

<Tip>Use `tiny` for nearly everything. Only upgrade for heavy accents, poor audio quality, or lyrics.</Tip>

***

## Example

**Request:**

```json theme={null}
{
  "audio_path": "/Users/you/Downloads/podcast.webm",
  "model_size": "tiny"
}
```

**Response:**

```json theme={null}
{
  "text": "Full transcription text...",
  "language": "en",
  "duration": 1076.12,
  "duration_formatted": "17:56",
  "segments": [
    {
      "start": 0.0,
      "end": 4.8,
      "timestamp": "0:00",
      "text": "Welcome back to the show. Today we're diving into..."
    },
    {
      "start": 4.8,
      "end": 9.2,
      "timestamp": "0:04",
      "text": "something I've been thinking about for a long time."
    }
  ],
  "segment_count": 430,
  "cached": false,
  "model_used": "tiny"
}
```

***

## Example: Transcribe a specific section

Use `start` and `duration` to transcribe only a portion of the file — no manual ffmpeg trimming needed.

```json theme={null}
{
  "audio_path": "/Users/you/Downloads/podcast.webm",
  "start": 600,
  "duration": 300
}
```

This transcribes 5 minutes starting at the 10-minute mark. Timestamps in the response are offset back to the original file position.

***

## Example: Export to file

```json theme={null}
{
  "audio_path": "/Users/you/Downloads/podcast.webm",
  "output": "~/Desktop/transcription.xlsx"
}
```

When `output` is provided, the transcription is written to disk and `output_path` is added to the response. Use `.xlsx` for styled spreadsheets with bold headers, or `.csv` for plain data.

***

## Parameters

| Parameter         | Required | Default   | Description                                                                                              |
| ----------------- | -------- | --------- | -------------------------------------------------------------------------------------------------------- |
| `audio_path`      | Yes      | —         | Path to the audio file                                                                                   |
| `model_size`      | No       | `tiny`    | Whisper model size                                                                                       |
| `start`           | No       | `0`       | Start transcription at this many seconds into the audio                                                  |
| `duration`        | No       | full file | Only transcribe this many seconds of audio                                                               |
| `output`          | No       | —         | File path to save transcription (`.csv` or `.xlsx`)                                                      |
| `translated_text` | No       | —         | English translation to store alongside the original. Used after translating a non-English transcription. |

***

## Multilingual

Augent transcribes audio in its **original language** — Chinese, French, Spanish, Japanese, etc. Translation to English is handled by Claude, which produces far better results than any local translation model.

When the transcription language is not English, the response includes:

```json theme={null}
{
  "language": "zh",
  "translation_available": true,
  "translation_hint": "This audio is in Chinese. To store an English translation..."
}
```

**Translation workflow:**

1. `transcribe_audio` returns the original-language transcription with `translation_available: true`
2. Claude translates the text
3. Claude calls `transcribe_audio` again with the same `audio_path` and `translated_text` containing the English translation
4. A sibling `(eng)` markdown file is created in memory alongside the original

Both versions appear in the Web UI Memory Explorer and are searchable via `search_memory`.

***

## Memory

* Transcriptions are stored by file content hash + model size
* Same file, same model = instant memory hit
* Same file, different model = new transcription
* Modified file = new transcription (hash changes)
* A markdown file is also saved to `~/.augent/memory/transcriptions/`
* Translated transcriptions get a sibling `(eng)` file (e.g., `My Video.md` + `My Video (eng).md`)
