Skip to main content
Format follows Keep a Changelog.

[2026.3.27] - 2026-03-27

Added

  • visual MCP tool: extract visual context from video at moments where audio alone isn’t enough. Four modes: query (semantic search for specific moments), auto (autonomous detection of UI actions, demonstrations, spatial references), manual (user-specified timestamps), and assist (flags visual gaps for manual screenshots, ideal for talking-head videos). Frames are saved directly into the Obsidian vault with ![[filename.png]] wikilink embeds.
  • take_notes visual integration: pass visual="the workflow setup" to take_notes for one-shot notes plus screenshots in a single call.
  • Smart frame selection: extracts three candidate frames per timestamp and picks the one with highest edge density, capturing the sharpest UI or screen content.
  • Duplicate frame detection: perceptual hashing drops near-identical frames automatically.
  • Intro B-roll suppression: first 30 seconds of a video require much higher confidence to prevent stock footage false positives.
  • Obsidian vault auto-detection: reads vault path from Obsidian’s own config. No hardcoded paths.

[2026.3.24] - 2026-03-24

Added

  • Obsidian graph view integration: YAML frontmatter on all memory .md files, [[wikilinks]] between related transcriptions, and MOC hub files for tag clusters.
  • rebuild_graph MCP tool: migrate files, compute related links, and generate MOCs in one command.
  • Chronological neighbor links: every new transcription links to the previous one, preventing orphan nodes.

Changed

  • take_notes output switched from .txt to .md: notes files now include auto-generated YAML frontmatter with tags, source links, and style metadata.
  • Tag changes sync to .md frontmatter: add_tags and remove_tags update the file in real time.
  • Multi-word tags hyphenated: prevents Obsidian from splitting them into separate tags.

[2026.3.21] - 2026-03-21

Added

  • User configuration (~/.augent/config.yaml): set defaults for model_size, output_dir, notes_output_dir, clip_padding, context_words, tts_voice, and tts_speed. Per-call arguments always override config values. Falls back to JSON if PyYAML is not installed. No config file required — all values have sensible defaults.
  • disabled_tools config key: list tool names to hide from MCP clients.

[2026.3.20] - 2026-03-20

Added

  • tag MCP tool: add, remove, or list tags on any transcription for organizing and filtering memories by topic.
  • Semantic auto-tagging: new transcriptions are automatically tagged using sentence-transformer embeddings: no LLM required. Works on both Web UI and MCP flows.
  • Web UI tag filtering: clickable tag pills above the Memory list with inline counts. Memory cards display up to 5 tag pills each.
  • Tags API: GET /api/memory/tags returns all tags with usage counts.

Changed

  • Clip export improvements: highlights and deep_search clips now use full time ranges instead of symmetric padding around a single point. Default padding increased from 5s to 10s.
  • Tagging flow: transcribe_audio and take_notes responses include the existing tag library so Claude reuses categories instead of creating duplicates.

Removed

  • Regex auto-tagger: replaced by semantic tagging.

[2026.3.19] - 2026-03-19

Added

  • highlights MCP tool: export MP4 clips of specific moments, auto-pick the best or target exactly what you want. Two modes: auto (AI picks top moments by content density) or focused (find moments matching a specific topic, person, or concept). Supports optional clip export with configurable padding.
  • Auto-tagging: transcriptions created via take_notes now automatically extract and store tags for faster discovery.
  • search_audio and deep_search clip export: both tools now support clip and clip_padding parameters to export MP4 clips around keyword matches.
  • Expanded multilingual support: multilingual section updated with 99 supported languages.

Changed

  • Tool descriptions: “extract” → “export” across highlights tool descriptions for consistency. Em dashes replaced with commas in tool table descriptions.
  • CI dependency bumps: actions/labeler v5 → v6, actions/stale v9 → v10, actions/checkout v4 → v6, actions/setup-python v5 → v6, huggingface-hub upper bound updated.

Fixed

  • CodeQL security fix: validated clip reveal path against tracked clips to prevent path traversal.
  • take_notes from memory: new output_path parameter allows saving notes from memory transcripts without requiring a URL.
  • Test coverage: added tests for highlights tool and auto-tagging.
  • Formatting: black formatting applied to all modified files, ruff lint fixes for unused variables.

[2026.3.15] - 2026-03-15

Added

  • Web UI visual clip export: click the film icon on any search result to create a region on the waveform, or drag-select any range manually. A clip toolbar appears below the waveform with ±1s/±5s nudge buttons for precise boundary adjustment, a preview button to hear exactly what will be exported, and an Export MP4 button. Keyboard shortcuts: Space to preview, Enter to export, Esc to close.
  • WaveSurfer Regions plugin: interactive regions on the audio waveform with draggable edges and body repositioning. Only one clip region active at a time.
  • Expanded Web UI tips: 8 tips (up from 3) covering clip export workflow, Memory tab re-search, parallel tabs, YouTube timestamps, and cross-memory search.

[2026.3.12] - 2026-03-12

Added

  • clip_export MCP tool: export a video clip from any URL for a specific time range. Downloads only the requested segment: not the full video. Supports YouTube, Instagram, TikTok, Twitter/X, and 1000+ sites via yt-dlp.
  • Multilingual translation flow: when transcribe_audio or take_notes detects non-English audio, a translation offer is returned. Accepting stores a clean English (eng) sibling file in memory alongside the original transcription.
  • Web UI “Show Transcript” button: new document icon on memory cards reveals the .md transcript file in Finder, separate from the audio file reveal.
  • Web UI “Show Audio” / “Show Transcript” in detail view: two distinct buttons to reveal either the audio source or the transcript file.
  • Web UI re-search from memory: search a previously transcribed file again without re-uploading.

Changed

  • Download filenames now include video ID: %(title)s [%(id)s].%(ext)s prevents overwrites when multiple videos share the same title (e.g., multiple Instagram reels from the same account).
  • Download filenames sanitized to ASCII: --restrict-filenames replaces Unicode characters (smart quotes, accented characters, etc.) with safe ASCII equivalents, preventing broken file paths in downstream tools.
  • Translation (eng) files are now clean English text: no timestamps, no per-segment mapping: just the full translated text with metadata header.
  • Translation offer moved to transcribe_audio level: works for any transcription call, not just take_notes. Removed fragile global state.
  • Clip export now overwrites existing files: --force-overwrites ensures re-exporting a clip with different padding replaces the previous file instead of silently skipping.
  • Default clip padding increased to 15 seconds: up from 10s (MCP) and 5s (CLI) for better context around keyword matches.
  • Web UI default port changed to 8282: moved from 9797 to avoid the crowded 9000s range.

Fixed

  • CI workflow auth failures: downgraded actions/checkout@v6@v4 and actions/setup-python@v6@v5; added explicit permissions: contents: read to Tests workflow.
  • Test count mismatch: updated expected tool count from 16 → 17 after clip_export was added.
  • Ruff lint errors: removed f-strings without placeholders, unused variables, dict() calls.
  • Black formatting: reformatted all modified files to pass CI.
  • Handler test coverage: added 59 tests for download_audio, clip_export, transcribe_audio, chapters, batch_search, and separate_audio: covering command construction, flag verification, parameter defaults, and error handling.

[2026.3.11] - 2026-03-11

Added

  • Web UI waveform for URL downloads: downloading audio from a URL now renders an interactive waveform with full playback controls (play/pause, skip to start/end, volume slider) — same experience as file uploads
  • Web UI download spinner: animated wormhole spinner replaces the progress bar during audio downloads for a cleaner look
  • Web UI transcription progress bar: real-time progress bar shown in the results panel during transcription
  • Web UI Show in Finder button: folder icon on memory cards to reveal the audio file (or .md fallback) in Finder
  • Web UI URL-based memory caching: pasting the same URL again skips re-downloading: instant cache hit by source URL + model size

Changed

  • Web UI memory stats: transcription count and placeholder text are now larger and use the green accent color for better readability
  • Web UI keyword placeholder: updated to “wormhole, open source, workflow”
  • Web UI progress/spinner location: moved from the log box to the results panel where it’s more visible

Fixed

  • augent-web startup: server no longer silently exits on launch; prints a clean “WebUI is live at” message
  • _kill_port killing own process: added PID check so the server doesn’t terminate itself on startup
  • Memory not saving from Web UI downloads: macOS tempfile.mkdtemp() created dirs in /var/folders/ which failed the /tmp security check — now forces /tmp as the base directory
  • Memory card title overlapping trash icon: added padding so long titles no longer clip behind the delete button
  • Memory list clipping at bottom: added bottom padding so the last card isn’t cut off by the panel edge
  • Audio endpoint 404 on macOS: resolved /tmp/private/tmp symlink issue and URL-encoded audio paths
  • CI build failure: fixed black formatting issues

[2026.3.10] - 2026-03-10

Added

  • Persistent source URLs: Source URLs from any platform (YouTube, Twitter/X, TikTok, Instagram, SoundCloud, and 1000+ sites) are now stored permanently by audio file hash in a dedicated source_urls table. Any future search or transcription of the same file — even after restarts, from a different path — automatically links back to the original source. No manual URL entry needed.
  • Web UI Memory Explorer: browse, search, view, and delete stored transcriptions from the browser. Each entry shows title, duration, model size, and date. YouTube-sourced entries are marked with a YT badge.
  • Web UI memory deletion: trash icon on each memory card with confirmation dialog. Removes the transcription, embeddings, and .md file.
  • Web UI YouTube timestamp linking: after uploading a file, paste a YouTube URL inline in the results to retroactively add clickable timestamp links to all matches.
  • Web UI URL paste: paste a YouTube or video URL directly in the search view to download and search audio without leaving the browser.
  • Web UI Share as HTML: download any transcript as a self-contained HTML page with all styling inline.
  • Web UI Show in Finder: reveal the original audio file in Finder (macOS), file manager (Linux), or Explorer (Windows). Uses AppleScript on macOS for reliable handling of special characters in paths.
  • Web UI favicon: custom Augent favicon embedded as base64, ships with every install.
  • Web UI PNG banner: Augent logo displayed in the log area, served from the package.
  • Source URL in .md files: transcription markdown files now include a **URL:** line in the metadata header when a source URL is available.

Changed

  • Web UI title: browser tab now reads “Augent Web UI” instead of “Augent”
  • Web UI tagline: displays “Web UI v” dynamically from package version
  • Web UI keywords: changed from single-line input to resizable textarea
  • Web UI results: row hover animation changed from border-left (caused text jitter) to clean translateY(-1px) on tbody rows only
  • Web UI snippets: markdown ** bold markers stripped from keyword matches (the UI already bolds keywords with CSS)
  • File upload naming: uploaded files now preserve their original filename in memory instead of storing as temp file names

Fixed

  • YouTube timestamp linking in Web UI: lastGrouped was not being set in the file upload SSE path, causing the “Link timestamps” button to silently fail
  • Results header hover: Time | Context header row no longer animates on hover (scoped to tbody tr only)
  • Show in Finder reliability: switched from open -R to AppleScript on macOS for paths with special characters

[2026.3.8] - 2026-03-08

Added

  • separate_audio MCP tool: audio source separation using Meta’s Demucs v4. Isolates vocals from music, background noise, and other sounds. Feeds directly into the transcription pipeline for clean results on noisy recordings.
  • separator optional dependency group: pip install augent[separator] installs Demucs. Also included in augent[all].
  • Hash-based separation caching: separated stems are stored at ~/.augent/separated/ by file hash. First run processes, every run after is instant.
  • Vocals-only mode: default two-stem separation (vocals + no_vocals) is faster than full 4-stem. Set vocals_only: false for all 4 stems (vocals, drums, bass, other).
  • Installer support for Demucs: install.sh now includes separator in the extras fallback chain and verifies demucs during package verification.
  • 78 new tests: CLI, TTS, speakers, and clips modules now fully tested (133 to 211 total)
  • CodeQL security scanning: weekly + on every push/PR
  • Dependabot: daily automated dependency and GitHub Actions updates
  • Makefile: make test, make lint, make fmt, make check for developer convenience
  • Discord badge and link in README

Changed

  • Speaker diarization upgraded to pyannote-audio: replaced the abandoned simple-diarizer (155 stars, last release 2022) with pyannote-audio (9.3k stars, actively maintained). Same tool interface, same caching, dramatically better accuracy. Requires a free Hugging Face token.

Improved

  • CI pipeline: added ruff + black enforcement, pip caching, Python 3.13 to test matrix
  • Ruff config: migrated to [tool.ruff.lint] namespace, added per-file ignores for intentional patterns
  • Coverage reporting: pytest-cov on Python 3.12 in CI with [tool.coverage] config

Fixed

  • Insecure temp file in mcp.py: replaced tempfile.mktemp with NamedTemporaryFile
  • Path validation in memory.py: realpath + isfile check before file access
  • Lint violations: 66 issues resolved across all modules (unused imports, raise-from, set comprehensions)
  • Formatting: entire codebase now passes black

[2026.2.28] - 2026-02-28

Added

  • context_words parameter on deep_search and search_memory: control how much context each result returns. Default 25 words (unchanged). Set to 150 for full evidence blocks when Claude needs to answer a question, not just find a moment
  • dedup_seconds parameter on deep_search and search_memory: merge matches from the same time range to avoid redundant results. Default 0 (off). Set to 60 for Q&A workflows
  • File output on all search and transcription tools: transcribe_audio, search_audio, deep_search, search_proximity now accept an output parameter for saving results directly to disk
  • XLSX export: .xlsx for styled spreadsheets with bold headers and formatted timestamps, .csv for plain data. Auto-detected from file extension
  • Per-segment timestamps on transcribe_audio: responses now include a segments array with start, end, timestamp, and text per segment instead of one flat text string
  • Audio trimming on transcribe_audio: start and duration parameters (in seconds) to transcribe specific sections without manual ffmpeg

Improved

  • Consistent output parameter: all search and transcription tools now follow the same export pattern search_memory introduced in 2026.2.26

[2026.2.26] - 2026-02-26

Added

  • search_memory tool: search across ALL stored transcriptions in one query, no audio_path needed
  • Keyword and semantic modes: search_memory defaults to literal keyword matching; opt into meaning-based search with mode: "semantic"
  • CSV export: optional output parameter on search_memory saves results as a CSV file
  • 25-word snippets: all search tools now return consistent ~25-word context snippets with keyword highlighting

Improved

  • Keyword highlighting: matched keywords shown in bold across all search results (search_audio, deep_search, search_memory, search_proximity)
  • CLI: augent memory search "query" with --semantic and --top-k flags

[2026.2.21] - 2026-02-21

Changed

  • Cache rebranded to Memory: all cache commands, paths, and APIs renamed to memory (~/.augent/cache/~/.augent/memory/)

Improved

  • Installer UX: animated spinners, paced output, and race condition fix for curl|bash piped installs
  • ASCII banner for CLI and installer using pyfiglet

[2026.2.16] - 2026-02-16

Added

  • OpenClaw integration: skill package for ClawHub + augent setup openclaw one-liner
  • Installer auto-detects OpenClaw and configures MCP alongside Claude
  • MCP protocol tests: 33 tests covering routing, tool listing, and error handling

[2026.2.15] - 2026-02-15

Fixed

  • TTS no longer blocks MCP: runs in background subprocess with job polling
  • Installer correctly selects framework Python for MCP config on macOS

[2026.2.14] - 2026-02-14

Improved

  • Quiz checkbox syntax: answer options render as Obsidian checkboxes
  • Answer key formatting enforced (bold number + letter, em dash, explanation)
  • Claude always routes video URLs through take_notes, never WebFetch

[2026.2.13] - 2026-02-13

Added

  • save_content mode for take_notes: bypasses Write tool, ensures post-processing runs
  • Installer auto-installs Python 3.12 when only 3.13 is available

Fixed

  • Installer eliminates silent failures and verifies all packages
  • take_notes embeds absolute file paths in Claude instructions
  • Skip re-downloading dependencies on reinstall

[2026.2.12] - 2026-02-12

Improved

  • Lazy imports for optional dependencies: installing mid-session works without restart
  • Preserve WAV file when ffmpeg conversion fails in TTS

[2026.2.9] - 2026-02-09

Added

  • Text-to-speech: Kokoro TTS with 54 voices across 9 languages
  • read_aloud option for take_notes: generates spoken MP3 and embeds in Obsidian

[2026.2.8] - 2026-02-08

Added

  • identify_speakers: speaker diarization using pyannote-audio
  • deep_search: semantic search using sentence-transformers (find by meaning, not keywords)
  • chapters: auto-detect topic boundaries with embedding similarity
  • 5 note styles for take_notes: tldr, notes, highlight, eye-candy, quiz
  • Obsidian .txt integration guide: full setup for live-synced notes

Changed

  • Renamed list_audio_fileslist_files, defaults to all common media formats
  • Enforced tiny as default model across all tool schemas

[2026.2.7] - 2026-02-07

Added

  • take_notes tool: one-click URL to formatted notes pipeline (download + transcribe + save .txt)

[2026.1.31] - 2026-01-31

Added

  • Title-based cache lookups: search cached transcriptions by name
  • Markdown transcription files: each cached transcription also saved as readable .md

[2026.1.29] - 2026-01-29

Changed

  • Python 3.10+ required (dropped 3.9 support)

Fixed

  • Homebrew Python compatibility (PATH, PEP 668, absolute paths)
  • Pinned yt-dlp to stable version for reliable downloads
  • Installer handles Homebrew permission issues gracefully

[2026.1.26] - 2026-01-26

Added

  • audio-downloader CLI tool: speed-optimized with aria2c (16 parallel connections)
  • download_audio MCP tool: Claude can download audio directly
  • Model size warnings for medium/large (resource-intensive)

Fixed

  • aria2c downloader-args format causing download failures
  • Web UI default port changed from 8888 to 8282

[2026.1.24] - 2026-01-24

Added

  • Web UI v1: local web interface with failproof startup
  • CI/CD: GitHub Actions testing on Python 3.10, 3.11, 3.12
  • Professional installer: one-liner curl | bash setup
  • Logo and branding

[2026.1.23] - 2026-01-23

Added

  • Initial release
  • MCP server exposing tools for Claude Code and Claude Desktop
  • Transcription engine powered by faster-whisper with word-level timestamps
  • Keyword search with timestamped matches and context snippets
  • Proximity search: find where keywords appear near each other
  • Batch processing: search multiple files in parallel
  • Three-layer caching: transcriptions, embeddings, and diarization in SQLite
  • CLI with search, transcribe, proximity, and cache management commands
  • Export formats: JSON, CSV, SRT, VTT, Markdown
  • Cross-platform support: macOS, Linux, Windows