Skip to main content
Unlike search_audio which matches exact keywords, deep_search finds content by meaning. “challenges of raising money” will match segments about fundraising difficulties even if those exact words aren’t used.

Example

Request:
{
  "audio_path": "/Users/you/Downloads/podcast.webm",
  "query": "challenges of raising venture capital"
}
Response:
{
  "query": "challenges of raising venture capital",
  "results": [
    {
      "start": 342.1,
      "end": 348.5,
      "text": "Getting investors to take us seriously was the hardest part of the whole journey.",
      "timestamp": "5:42",
      "similarity": 0.7823
    },
    {
      "start": 891.0,
      "end": 897.2,
      "text": "We pitched over fifty firms before anyone wrote a check.",
      "timestamp": "14:51",
      "similarity": 0.7104
    }
  ],
  "total_segments": 245,
  "model_used": "tiny"
}

Parameters

ParameterRequiredDefaultDescription
audio_pathYesPath to the audio/video file
queryYesNatural language search query
model_sizeNotinyWhisper model size for transcription
top_kNo5Number of results to return
outputNoFile path to save results (.csv or .xlsx)
context_wordsNo25Words of context per result. Use 150 for full evidence blocks when answering questions
dedup_secondsNo0Merge matches within this many seconds of each other. Use 60 for Q&A to avoid redundant results
clipNofalseExport video clips around each match. Requires the audio to have been downloaded from a URL
clip_paddingNo15Seconds of padding before and after each match for clip export

Notes

Embeddings are stored in memory. The first search on a file computes embeddings for all segments. Subsequent searches on the same file are instant.
Results are ranked by cosine similarity (0 to 1). Higher similarity = closer match.