Unlike search_audio which matches exact keywords, deep_search finds content by meaning. “challenges of raising money” will match segments about fundraising difficulties even if those exact words aren’t used.
Example
Request:
{
"audio_path": "/Users/you/Downloads/podcast.webm",
"query": "challenges of raising venture capital"
}
Response:
{
"query": "challenges of raising venture capital",
"results": [
{
"start": 342.1,
"end": 348.5,
"text": "Getting investors to take us seriously was the hardest part of the whole journey.",
"timestamp": "5:42",
"similarity": 0.7823
},
{
"start": 891.0,
"end": 897.2,
"text": "We pitched over fifty firms before anyone wrote a check.",
"timestamp": "14:51",
"similarity": 0.7104
}
],
"total_segments": 245,
"model_used": "tiny"
}
Parameters
| Parameter | Required | Default | Description |
|---|
audio_path | Yes | — | Path to the audio/video file |
query | Yes | — | Natural language search query |
model_size | No | tiny | Whisper model size for transcription |
top_k | No | 5 | Number of results to return |
output | No | — | File path to save results (.csv or .xlsx) |
context_words | No | 25 | Words of context per result. Use 150 for full evidence blocks when answering questions |
dedup_seconds | No | 0 | Merge matches within this many seconds of each other. Use 60 for Q&A to avoid redundant results |
clip | No | false | Export video clips around each match. Requires the audio to have been downloaded from a URL |
clip_padding | No | 15 | Seconds of padding before and after each match for clip export |
Notes
Embeddings are stored in memory. The first search on a file computes embeddings for all segments. Subsequent searches on the same file are instant.
Results are ranked by cosine similarity (0 to 1). Higher similarity = closer match.