Detects where the topic changes in your audio and splits it into chapters with timestamps.
Example
Request:
{
"audio_path": "/Users/you/Downloads/lecture.webm"
}
Response:
{
"chapters": [
{
"chapter_number": 1,
"start": 0.0,
"end": 245.3,
"start_timestamp": "0:00",
"end_timestamp": "4:05",
"text": "Welcome everyone. Today we'll cover three main topics...",
"segment_count": 12
},
{
"chapter_number": 2,
"start": 245.3,
"end": 892.1,
"start_timestamp": "4:05",
"end_timestamp": "14:52",
"text": "Let's start with neural network architectures...",
"segment_count": 35
},
{
"chapter_number": 3,
"start": 892.1,
"end": 1523.7,
"start_timestamp": "14:52",
"end_timestamp": "25:23",
"text": "Now moving on to training techniques...",
"segment_count": 28
}
],
"total_chapters": 3,
"duration": 1523.7,
"model_used": "tiny"
}
Parameters
| Parameter | Required | Default | Description |
|---|
audio_path | Yes | — | Path to the audio/video file |
model_size | No | tiny | Whisper model size for transcription |
sensitivity | No | 0.4 | Chapter detection sensitivity (0.0 = many small chapters, 1.0 = few large chapters) |
Notes
Chapters are detected by measuring how much the topic shifts between consecutive segments. A large similarity drop = new chapter.
Adjust sensitivity to control granularity. Lower values produce more chapters, higher values produce fewer.
Embeddings are shared with deep_search. If one has already run on a file, the other reuses the stored embeddings.
Works great on long-form content like lectures, podcasts, and meetings. Short clips under 5 minutes rarely produce meaningful chapter splits.