How it works
- Every segment of the transcript is encoded into a 384-dimensional vector using
all-MiniLM-L6-v2(a sentence-transformer model) - Your search query is encoded with the same model
- Augent computes cosine similarity between the query vector and every segment vector
- Results are ranked by similarity score
Keyword search vs. semantic search
Keyword (search_audio) | Semantic (deep_search) | |
|---|---|---|
| Matching | Exact string match | Meaning-based similarity |
| Speed | Instant (text scan) | Fast (vector comparison) |
| Best for | Finding specific terms, names, numbers | Finding discussions about a topic |
| Example | ”Series A” finds “Series A" | "fundraising” finds “we closed our round” |
Embeddings are cached
The first semantic search on a file computes embeddings for all segments and stores them in SQLite. Every subsequent semantic query on that file reuses the cached embeddings — only the query itself needs to be encoded, which takes milliseconds.Deduplication
Whendedup_seconds is set (e.g., 60), results that are within that many seconds of each other are merged. This prevents getting 5 results from the same 2-minute discussion. Augent overcollects candidates internally to compensate for filtered results.
Context words
Thecontext_words parameter controls how much text surrounds each result:
25(default): a sentence or two, enough to see the match in context150: a full paragraph, enough for Claude to answer questions from the evidence
Cross-memory search
search_memory uses the same engine but searches across all stored transcriptions — no file path needed. One query, hundreds of hours of audio.
