[2026.3.27] - 2026-03-27
Added
visualMCP tool: extract visual context from video at moments where audio alone isn’t enough. Four modes: query (semantic search for specific moments), auto (autonomous detection of UI actions, demonstrations, spatial references), manual (user-specified timestamps), and assist (flags visual gaps for manual screenshots, ideal for talking-head videos). Frames are saved directly into the Obsidian vault with![[filename.png]]wikilink embeds.take_notesvisual integration: passvisual="the workflow setup"totake_notesfor one-shot notes plus screenshots in a single call.- Smart frame selection: extracts three candidate frames per timestamp and picks the one with highest edge density, capturing the sharpest UI or screen content.
- Duplicate frame detection: perceptual hashing drops near-identical frames automatically.
- Intro B-roll suppression: first 30 seconds of a video require much higher confidence to prevent stock footage false positives.
- Obsidian vault auto-detection: reads vault path from Obsidian’s own config. No hardcoded paths.
[2026.3.24] - 2026-03-24
Added
- Obsidian graph view integration: YAML frontmatter on all memory
.mdfiles,[[wikilinks]]between related transcriptions, and MOC hub files for tag clusters. rebuild_graphMCP tool: migrate files, compute related links, and generate MOCs in one command.- Chronological neighbor links: every new transcription links to the previous one, preventing orphan nodes.
Changed
take_notesoutput switched from.txtto.md: notes files now include auto-generated YAML frontmatter with tags, source links, and style metadata.- Tag changes sync to
.mdfrontmatter:add_tagsandremove_tagsupdate the file in real time. - Multi-word tags hyphenated: prevents Obsidian from splitting them into separate tags.
[2026.3.21] - 2026-03-21
Added
- User configuration (
~/.augent/config.yaml): set defaults for model_size, output_dir, notes_output_dir, clip_padding, context_words, tts_voice, and tts_speed. Per-call arguments always override config values. Falls back to JSON if PyYAML is not installed. No config file required — all values have sensible defaults. disabled_toolsconfig key: list tool names to hide from MCP clients.
[2026.3.20] - 2026-03-20
Added
tagMCP tool: add, remove, or list tags on any transcription for organizing and filtering memories by topic.- Semantic auto-tagging: new transcriptions are automatically tagged using sentence-transformer embeddings: no LLM required. Works on both Web UI and MCP flows.
- Web UI tag filtering: clickable tag pills above the Memory list with inline counts. Memory cards display up to 5 tag pills each.
- Tags API:
GET /api/memory/tagsreturns all tags with usage counts.
Changed
- Clip export improvements: highlights and deep_search clips now use full time ranges instead of symmetric padding around a single point. Default padding increased from 5s to 10s.
- Tagging flow:
transcribe_audioandtake_notesresponses include the existing tag library so Claude reuses categories instead of creating duplicates.
Removed
- Regex auto-tagger: replaced by semantic tagging.
[2026.3.19] - 2026-03-19
Added
highlightsMCP tool: export MP4 clips of specific moments, auto-pick the best or target exactly what you want. Two modes: auto (AI picks top moments by content density) or focused (find moments matching a specific topic, person, or concept). Supports optional clip export with configurable padding.- Auto-tagging: transcriptions created via
take_notesnow automatically extract and store tags for faster discovery. search_audioanddeep_searchclip export: both tools now supportclipandclip_paddingparameters to export MP4 clips around keyword matches.- Expanded multilingual support: multilingual section updated with 99 supported languages.
Changed
- Tool descriptions: “extract” → “export” across highlights tool descriptions for consistency. Em dashes replaced with commas in tool table descriptions.
- CI dependency bumps:
actions/labelerv5 → v6,actions/stalev9 → v10,actions/checkoutv4 → v6,actions/setup-pythonv5 → v6,huggingface-hubupper bound updated.
Fixed
- CodeQL security fix: validated clip reveal path against tracked clips to prevent path traversal.
take_notesfrom memory: newoutput_pathparameter allows saving notes from memory transcripts without requiring a URL.- Test coverage: added tests for highlights tool and auto-tagging.
- Formatting: black formatting applied to all modified files, ruff lint fixes for unused variables.
[2026.3.15] - 2026-03-15
Added
- Web UI visual clip export: click the film icon on any search result to create a region on the waveform, or drag-select any range manually. A clip toolbar appears below the waveform with ±1s/±5s nudge buttons for precise boundary adjustment, a preview button to hear exactly what will be exported, and an Export MP4 button. Keyboard shortcuts:
Spaceto preview,Enterto export,Escto close. - WaveSurfer Regions plugin: interactive regions on the audio waveform with draggable edges and body repositioning. Only one clip region active at a time.
- Expanded Web UI tips: 8 tips (up from 3) covering clip export workflow, Memory tab re-search, parallel tabs, YouTube timestamps, and cross-memory search.
[2026.3.12] - 2026-03-12
Added
clip_exportMCP tool: export a video clip from any URL for a specific time range. Downloads only the requested segment: not the full video. Supports YouTube, Instagram, TikTok, Twitter/X, and 1000+ sites via yt-dlp.- Multilingual translation flow: when
transcribe_audioortake_notesdetects non-English audio, a translation offer is returned. Accepting stores a clean English(eng)sibling file in memory alongside the original transcription. - Web UI “Show Transcript” button: new document icon on memory cards reveals the
.mdtranscript file in Finder, separate from the audio file reveal. - Web UI “Show Audio” / “Show Transcript” in detail view: two distinct buttons to reveal either the audio source or the transcript file.
- Web UI re-search from memory: search a previously transcribed file again without re-uploading.
Changed
- Download filenames now include video ID:
%(title)s [%(id)s].%(ext)sprevents overwrites when multiple videos share the same title (e.g., multiple Instagram reels from the same account). - Download filenames sanitized to ASCII:
--restrict-filenamesreplaces Unicode characters (smart quotes, accented characters, etc.) with safe ASCII equivalents, preventing broken file paths in downstream tools. - Translation
(eng)files are now clean English text: no timestamps, no per-segment mapping: just the full translated text with metadata header. - Translation offer moved to
transcribe_audiolevel: works for any transcription call, not justtake_notes. Removed fragile global state. - Clip export now overwrites existing files:
--force-overwritesensures re-exporting a clip with different padding replaces the previous file instead of silently skipping. - Default clip padding increased to 15 seconds: up from 10s (MCP) and 5s (CLI) for better context around keyword matches.
- Web UI default port changed to 8282: moved from 9797 to avoid the crowded 9000s range.
Fixed
- CI workflow auth failures: downgraded
actions/checkout@v6→@v4andactions/setup-python@v6→@v5; added explicitpermissions: contents: readto Tests workflow. - Test count mismatch: updated expected tool count from 16 → 17 after
clip_exportwas added. - Ruff lint errors: removed f-strings without placeholders, unused variables,
dict()calls. - Black formatting: reformatted all modified files to pass CI.
- Handler test coverage: added 59 tests for
download_audio,clip_export,transcribe_audio,chapters,batch_search, andseparate_audio: covering command construction, flag verification, parameter defaults, and error handling.
[2026.3.11] - 2026-03-11
Added
- Web UI waveform for URL downloads: downloading audio from a URL now renders an interactive waveform with full playback controls (play/pause, skip to start/end, volume slider) — same experience as file uploads
- Web UI download spinner: animated wormhole spinner replaces the progress bar during audio downloads for a cleaner look
- Web UI transcription progress bar: real-time progress bar shown in the results panel during transcription
- Web UI Show in Finder button: folder icon on memory cards to reveal the audio file (or
.mdfallback) in Finder - Web UI URL-based memory caching: pasting the same URL again skips re-downloading: instant cache hit by source URL + model size
Changed
- Web UI memory stats: transcription count and placeholder text are now larger and use the green accent color for better readability
- Web UI keyword placeholder: updated to “wormhole, open source, workflow”
- Web UI progress/spinner location: moved from the log box to the results panel where it’s more visible
Fixed
augent-webstartup: server no longer silently exits on launch; prints a clean “WebUI is live at” message_kill_portkilling own process: added PID check so the server doesn’t terminate itself on startup- Memory not saving from Web UI downloads: macOS
tempfile.mkdtemp()created dirs in/var/folders/which failed the/tmpsecurity check — now forces/tmpas the base directory - Memory card title overlapping trash icon: added padding so long titles no longer clip behind the delete button
- Memory list clipping at bottom: added bottom padding so the last card isn’t cut off by the panel edge
- Audio endpoint 404 on macOS: resolved
/tmp→/private/tmpsymlink issue and URL-encoded audio paths - CI build failure: fixed black formatting issues
[2026.3.10] - 2026-03-10
Added
- Persistent source URLs: Source URLs from any platform (YouTube, Twitter/X, TikTok, Instagram, SoundCloud, and 1000+ sites) are now stored permanently by audio file hash in a dedicated
source_urlstable. Any future search or transcription of the same file — even after restarts, from a different path — automatically links back to the original source. No manual URL entry needed. - Web UI Memory Explorer: browse, search, view, and delete stored transcriptions from the browser. Each entry shows title, duration, model size, and date. YouTube-sourced entries are marked with a YT badge.
- Web UI memory deletion: trash icon on each memory card with confirmation dialog. Removes the transcription, embeddings, and
.mdfile. - Web UI YouTube timestamp linking: after uploading a file, paste a YouTube URL inline in the results to retroactively add clickable timestamp links to all matches.
- Web UI URL paste: paste a YouTube or video URL directly in the search view to download and search audio without leaving the browser.
- Web UI Share as HTML: download any transcript as a self-contained HTML page with all styling inline.
- Web UI Show in Finder: reveal the original audio file in Finder (macOS), file manager (Linux), or Explorer (Windows). Uses AppleScript on macOS for reliable handling of special characters in paths.
- Web UI favicon: custom Augent favicon embedded as base64, ships with every install.
- Web UI PNG banner: Augent logo displayed in the log area, served from the package.
- Source URL in
.mdfiles: transcription markdown files now include a**URL:**line in the metadata header when a source URL is available.
Changed
- Web UI title: browser tab now reads “Augent Web UI” instead of “Augent”
- Web UI tagline: displays “Web UI v” dynamically from package version
- Web UI keywords: changed from single-line input to resizable textarea
- Web UI results: row hover animation changed from border-left (caused text jitter) to clean
translateY(-1px)on tbody rows only - Web UI snippets: markdown
**bold markers stripped from keyword matches (the UI already bolds keywords with CSS) - File upload naming: uploaded files now preserve their original filename in memory instead of storing as temp file names
Fixed
- YouTube timestamp linking in Web UI:
lastGroupedwas not being set in the file upload SSE path, causing the “Link timestamps” button to silently fail - Results header hover:
Time | Contextheader row no longer animates on hover (scoped totbody tronly) - Show in Finder reliability: switched from
open -Rto AppleScript on macOS for paths with special characters
[2026.3.8] - 2026-03-08
Added
separate_audioMCP tool: audio source separation using Meta’s Demucs v4. Isolates vocals from music, background noise, and other sounds. Feeds directly into the transcription pipeline for clean results on noisy recordings.separatoroptional dependency group:pip install augent[separator]installs Demucs. Also included inaugent[all].- Hash-based separation caching: separated stems are stored at
~/.augent/separated/by file hash. First run processes, every run after is instant. - Vocals-only mode: default two-stem separation (vocals + no_vocals) is faster than full 4-stem. Set
vocals_only: falsefor all 4 stems (vocals, drums, bass, other). - Installer support for Demucs:
install.shnow includes separator in the extras fallback chain and verifies demucs during package verification. - 78 new tests: CLI, TTS, speakers, and clips modules now fully tested (133 to 211 total)
- CodeQL security scanning: weekly + on every push/PR
- Dependabot: daily automated dependency and GitHub Actions updates
- Makefile:
make test,make lint,make fmt,make checkfor developer convenience - Discord badge and link in README
Changed
- Speaker diarization upgraded to pyannote-audio: replaced the abandoned simple-diarizer (155 stars, last release 2022) with pyannote-audio (9.3k stars, actively maintained). Same tool interface, same caching, dramatically better accuracy. Requires a free Hugging Face token.
Improved
- CI pipeline: added ruff + black enforcement, pip caching, Python 3.13 to test matrix
- Ruff config: migrated to
[tool.ruff.lint]namespace, added per-file ignores for intentional patterns - Coverage reporting: pytest-cov on Python 3.12 in CI with
[tool.coverage]config
Fixed
- Insecure temp file in
mcp.py: replacedtempfile.mktempwithNamedTemporaryFile - Path validation in
memory.py:realpath+isfilecheck before file access - Lint violations: 66 issues resolved across all modules (unused imports, raise-from, set comprehensions)
- Formatting: entire codebase now passes black
[2026.2.28] - 2026-02-28
Added
context_wordsparameter ondeep_searchandsearch_memory: control how much context each result returns. Default 25 words (unchanged). Set to 150 for full evidence blocks when Claude needs to answer a question, not just find a momentdedup_secondsparameter ondeep_searchandsearch_memory: merge matches from the same time range to avoid redundant results. Default 0 (off). Set to 60 for Q&A workflows- File output on all search and transcription tools:
transcribe_audio,search_audio,deep_search,search_proximitynow accept anoutputparameter for saving results directly to disk - XLSX export:
.xlsxfor styled spreadsheets with bold headers and formatted timestamps,.csvfor plain data. Auto-detected from file extension - Per-segment timestamps on
transcribe_audio: responses now include asegmentsarray withstart,end,timestamp, andtextper segment instead of one flat text string - Audio trimming on
transcribe_audio:startanddurationparameters (in seconds) to transcribe specific sections without manual ffmpeg
Improved
- Consistent
outputparameter: all search and transcription tools now follow the same export patternsearch_memoryintroduced in 2026.2.26
[2026.2.26] - 2026-02-26
Added
search_memorytool: search across ALL stored transcriptions in one query, no audio_path needed- Keyword and semantic modes:
search_memorydefaults to literal keyword matching; opt into meaning-based search withmode: "semantic" - CSV export: optional
outputparameter onsearch_memorysaves results as a CSV file - 25-word snippets: all search tools now return consistent ~25-word context snippets with keyword highlighting
Improved
- Keyword highlighting: matched keywords shown in bold across all search results (search_audio, deep_search, search_memory, search_proximity)
- CLI:
augent memory search "query"with--semanticand--top-kflags
[2026.2.21] - 2026-02-21
Changed
- Cache rebranded to Memory: all
cachecommands, paths, and APIs renamed tomemory(~/.augent/cache/→~/.augent/memory/)
Improved
- Installer UX: animated spinners, paced output, and race condition fix for
curl|bashpiped installs - ASCII banner for CLI and installer using pyfiglet
[2026.2.16] - 2026-02-16
Added
- OpenClaw integration: skill package for ClawHub +
augent setup openclawone-liner - Installer auto-detects OpenClaw and configures MCP alongside Claude
- MCP protocol tests: 33 tests covering routing, tool listing, and error handling
[2026.2.15] - 2026-02-15
Fixed
- TTS no longer blocks MCP: runs in background subprocess with job polling
- Installer correctly selects framework Python for MCP config on macOS
[2026.2.14] - 2026-02-14
Improved
- Quiz checkbox syntax: answer options render as Obsidian checkboxes
- Answer key formatting enforced (bold number + letter, em dash, explanation)
- Claude always routes video URLs through
take_notes, never WebFetch
[2026.2.13] - 2026-02-13
Added
save_contentmode fortake_notes: bypasses Write tool, ensures post-processing runs- Installer auto-installs Python 3.12 when only 3.13 is available
Fixed
- Installer eliminates silent failures and verifies all packages
take_notesembeds absolute file paths in Claude instructions- Skip re-downloading dependencies on reinstall
[2026.2.12] - 2026-02-12
Improved
- Lazy imports for optional dependencies: installing mid-session works without restart
- Preserve WAV file when ffmpeg conversion fails in TTS
[2026.2.9] - 2026-02-09
Added
- Text-to-speech: Kokoro TTS with 54 voices across 9 languages
read_aloudoption fortake_notes: generates spoken MP3 and embeds in Obsidian
[2026.2.8] - 2026-02-08
Added
identify_speakers: speaker diarization using pyannote-audiodeep_search: semantic search using sentence-transformers (find by meaning, not keywords)chapters: auto-detect topic boundaries with embedding similarity- 5 note styles for
take_notes: tldr, notes, highlight, eye-candy, quiz - Obsidian .txt integration guide: full setup for live-synced notes
Changed
- Renamed
list_audio_files→list_files, defaults to all common media formats - Enforced
tinyas default model across all tool schemas
[2026.2.7] - 2026-02-07
Added
take_notestool: one-click URL to formatted notes pipeline (download + transcribe + save .txt)
[2026.1.31] - 2026-01-31
Added
- Title-based cache lookups: search cached transcriptions by name
- Markdown transcription files: each cached transcription also saved as readable
.md
[2026.1.29] - 2026-01-29
Changed
- Python 3.10+ required (dropped 3.9 support)
Fixed
- Homebrew Python compatibility (PATH, PEP 668, absolute paths)
- Pinned yt-dlp to stable version for reliable downloads
- Installer handles Homebrew permission issues gracefully
[2026.1.26] - 2026-01-26
Added
audio-downloaderCLI tool: speed-optimized with aria2c (16 parallel connections)download_audioMCP tool: Claude can download audio directly- Model size warnings for medium/large (resource-intensive)
Fixed
- aria2c downloader-args format causing download failures
- Web UI default port changed from 8888 to 8282
[2026.1.24] - 2026-01-24
Added
- Web UI v1: local web interface with failproof startup
- CI/CD: GitHub Actions testing on Python 3.10, 3.11, 3.12
- Professional installer: one-liner
curl | bashsetup - Logo and branding
[2026.1.23] - 2026-01-23
Added
- Initial release
- MCP server exposing tools for Claude Code and Claude Desktop
- Transcription engine powered by faster-whisper with word-level timestamps
- Keyword search with timestamped matches and context snippets
- Proximity search: find where keywords appear near each other
- Batch processing: search multiple files in parallel
- Three-layer caching: transcriptions, embeddings, and diarization in SQLite
- CLI with search, transcribe, proximity, and cache management commands
- Export formats: JSON, CSV, SRT, VTT, Markdown
- Cross-platform support: macOS, Linux, Windows

