> ## Documentation Index
> Fetch the complete documentation index at: https://docs.augent.app/llms.txt
> Use this file to discover all available pages before exploring further.

# Semantic Search

> How deep_search finds content by meaning using sentence-transformer embeddings.

Augent has two search modes. **Keyword search** finds exact words. **Semantic search** finds content by meaning — even when the exact words don't match.

***

## How it works

1. Every segment of the transcript is encoded into a 384-dimensional vector using `all-MiniLM-L6-v2` (a sentence-transformer model)
2. Your search query is encoded with the same model
3. Augent computes cosine similarity between the query vector and every segment vector
4. Results are ranked by similarity score

This means searching for "funding challenges" will find segments where someone says "we were running out of money" — even though none of those words overlap.

***

## Keyword search vs. semantic search

|              | Keyword (`search_audio`)               | Semantic (`deep_search`)                  |
| ------------ | -------------------------------------- | ----------------------------------------- |
| **Matching** | Exact string match                     | Meaning-based similarity                  |
| **Speed**    | Instant (text scan)                    | Fast (vector comparison)                  |
| **Best for** | Finding specific terms, names, numbers | Finding discussions about a topic         |
| **Example**  | "Series A" finds "Series A"            | "fundraising" finds "we closed our round" |

Use keyword search when you know the exact words. Use semantic search when you know the concept but not the phrasing.

***

## Embeddings are cached

The first semantic search on a file computes embeddings for all segments and stores them in SQLite. Every subsequent semantic query on that file reuses the cached embeddings — only the query itself needs to be encoded, which takes milliseconds.

***

## Deduplication

When `dedup_seconds` is set (e.g., `60`), results that are within that many seconds of each other are merged. This prevents getting 5 results from the same 2-minute discussion. Augent overcollects candidates internally to compensate for filtered results.

***

## Context words

The `context_words` parameter controls how much text surrounds each result:

* `25` (default): a sentence or two, enough to see the match in context
* `150`: a full paragraph, enough for Claude to answer questions from the evidence

Query words longer than 4 characters are highlighted in **bold** in the snippet.

***

## Cross-memory search

`search_memory` uses the same engine but searches across **all** stored transcriptions — no file path needed. One query, hundreds of hours of audio.