The workflow
A single video explanation becomes a fully structured, replicable system:- Transcribe the audio, extracting every word with timestamps
- Build workflow files from the explanation, the sequencing, the steps, the tools mentioned, the order of operations
- Detect visual gaps where the speaker describes something that needs to be seen (“this screen”, “click here”, “the layout looks like”)
- Extract screenshots at those moments, picking the sharpest frame with the most visual information
- Assemble the package with transcription + structured workflow + visual context, embedded inline in the notes with Obsidian wikilinks
How it works
Thevisual tool analyzes the transcript to identify moments where visual context is needed, then extracts frames only at those timestamps. Four modes:
Query mode (primary): describe what you need visual context for. The tool searches the transcript semantically and extracts frames at matching moments.
visual directly to take_notes and get notes plus screenshots in a single call.
![[frame.png]] embeds at the relevant sections. Open the file in Obsidian and every screenshot renders inline, right where it belongs.
The pipeline
| Step | Tool | What it does |
|---|---|---|
| Download | download_audio | Pulls audio from any URL at maximum speed |
| Transcribe | transcribe_audio | Full transcription with per-segment timestamps |
| Structure | take_notes | Builds formatted workflow notes with sections, steps, and sequencing |
| Visual context | visual | Extracts screenshots at moments where audio alone isn’t enough |
| Find moments | highlights | Identifies the most important moments by content density |
| Find topics | deep_search | Searches by meaning to find specific workflow steps |
| Find context | search_proximity | Finds where two concepts appear near each other |
| Extract clips | clip_export | Exports video segments around specific timestamps |
| Detect chapters | chapters | Auto-detects topic boundaries and transitions |
| Identify speakers | identify_speakers | Labels who is explaining which part of the workflow |

