Semantic Search

Augent has two search modes. Keyword search finds exact words. Semantic search finds content by meaning — even when the exact words don’t match.

How it works

Every segment of the transcript is encoded into a 384-dimensional vector using all-MiniLM-L6-v2 (a sentence-transformer model)
Your search query is encoded with the same model
Augent computes cosine similarity between the query vector and every segment vector
Results are ranked by similarity score

This means searching for “funding challenges” will find segments where someone says “we were running out of money” — even though none of those words overlap.

Keyword search vs. semantic search

	Keyword (`search_audio`)	Semantic (`deep_search`)
Matching	Exact string match	Meaning-based similarity
Speed	Instant (text scan)	Fast (vector comparison)
Best for	Finding specific terms, names, numbers	Finding discussions about a topic
Example	”Series A” finds “Series A"	"fundraising” finds “we closed our round”

Use keyword search when you know the exact words. Use semantic search when you know the concept but not the phrasing.

Embeddings are cached

The first semantic search on a file computes embeddings for all segments and stores them in SQLite. Every subsequent semantic query on that file reuses the cached embeddings — only the query itself needs to be encoded, which takes milliseconds.

Deduplication

When dedup_seconds is set (e.g., 60), results that are within that many seconds of each other are merged. This prevents getting 5 results from the same 2-minute discussion. Augent overcollects candidates internally to compensate for filtered results.

Context words

The context_words parameter controls how much text surrounds each result:

25 (default): a sentence or two, enough to see the match in context
150: a full paragraph, enough for Claude to answer questions from the evidence

Query words longer than 4 characters are highlighted in bold in the snippet.

Cross-memory search

search_memory uses the same engine but searches across all stored transcriptions — no file path needed. One query, hundreds of hours of audio.

​How it works

​Keyword search vs. semantic search

​Embeddings are cached

​Deduplication

​Context words

​Cross-memory search