Skip to main content
Detects where the topic changes in your audio and splits it into chapters with timestamps.

Example

Request:
{
  "audio_path": "/Users/you/Downloads/lecture.webm"
}
Response:
{
  "chapters": [
    {
      "chapter_number": 1,
      "start": 0.0,
      "end": 245.3,
      "start_timestamp": "0:00",
      "end_timestamp": "4:05",
      "text": "Welcome everyone. Today we'll cover three main topics...",
      "segment_count": 12
    },
    {
      "chapter_number": 2,
      "start": 245.3,
      "end": 892.1,
      "start_timestamp": "4:05",
      "end_timestamp": "14:52",
      "text": "Let's start with neural network architectures...",
      "segment_count": 35
    },
    {
      "chapter_number": 3,
      "start": 892.1,
      "end": 1523.7,
      "start_timestamp": "14:52",
      "end_timestamp": "25:23",
      "text": "Now moving on to training techniques...",
      "segment_count": 28
    }
  ],
  "total_chapters": 3,
  "duration": 1523.7,
  "model_used": "tiny"
}

Parameters

ParameterRequiredDefaultDescription
audio_pathYesPath to the audio/video file
model_sizeNotinyWhisper model size for transcription
sensitivityNo0.4Chapter detection sensitivity (0.0 = many small chapters, 1.0 = few large chapters)

Notes

Chapters are detected by measuring how much the topic shifts between consecutive segments. A large similarity drop = new chapter.
Adjust sensitivity to control granularity. Lower values produce more chapters, higher values produce fewer.
Embeddings are shared with deep_search. If one has already run on a file, the other reuses the stored embeddings.
Works great on long-form content like lectures, podcasts, and meetings. Short clips under 5 minutes rarely produce meaningful chapter splits.