Search entire podcast libraries, interview collections, or any batch of files in parallel.
Example
Request:
{
"audio_paths": [
"/Users/you/Downloads/episode1.webm",
"/Users/you/Downloads/episode2.webm",
"/Users/you/Downloads/episode3.webm"
],
"keywords": ["AI", "automation"],
"workers": 3
}
Response:
{
"files_processed": 3,
"files_with_errors": 0,
"total_matches": 12,
"results": {
"/Users/you/Downloads/episode1.webm": {
"AI": [{ "timestamp": "1:30", "snippet": "..." }]
}
},
"model_used": "tiny"
}
Parameters
| Parameter | Required | Default | Description |
|---|
audio_paths | Yes | — | List of paths to audio files |
keywords | Yes | — | List of keywords or phrases to search for |
model_size | No | tiny | Whisper model size for transcription |
workers | No | 2 | Number of parallel workers |
Notes
Each file is transcribed and stored independently. Re-running a batch search where some files are already in memory will only transcribe the new ones.
Use list_files first to discover files in a directory, then pass the paths to batch_search.
Files already in memory are skipped during transcription, so re-running a batch with new files added is efficient. Only the new ones get transcribed.