Video editors spend hours manually scrubbing through raw footage to find specific scenes, remove unwanted content, or assemble rough cuts
A plugin for major NLEs that indexes project footage with multimodal embeddings, accepts natural language commands like 'remove all scenes containing cats' or 'find all close-up shots', and outputs EDLs for manual refinement
Subscription - $29/mo for individual editors, $99/mo for teams, with per-hour indexing costs passed through
Editors universally hate scrubbing through hours of raw footage. This is the single most tedious part of the editing workflow. The Reddit thread with 425 upvotes and 107 comments confirms real demand. Documentary editors, wedding videographers, and YouTubers all cite this as their #1 time sink. A 10-hour shoot might yield 30 minutes of final product.
TAM: ~35M active video editors globally (Adobe claims 23M+ Creative Cloud subscribers, many edit video). SAM: ~5M who regularly work with large amounts of raw footage (professionals + serious YouTubers). SOM for Year 1: capturing even 10K paying users at $29/mo = $3.5M ARR. The per-hour indexing model adds variable revenue. Not a winner-take-all market - niche enough to build a sustainable business.
Professional editors already pay $20-55/mo for NLE subscriptions and routinely buy $50-500 plugins. $29/mo is well within the plugin budget for anyone billing clients $50-150/hr. If this saves 2 hours per project, it pays for itself on a single job. YouTubers are more price-sensitive but the time savings are compelling. The 425 upvotes suggest genuine pull, not just casual interest.
This is the hard part. Building multimodal embeddings for video scene search requires significant ML infrastructure. You need: (1) a video indexing pipeline using models like CLIP, InternVideo, or similar, (2) a vector search layer, (3) NLE plugin development for Premiere (CEP/UXP SDK) and DaVinci (Fusion scripting), (4) EDL/XML export, (5) natural language query parsing. A solo dev with ML experience could build a rough MVP for ONE NLE in 8-12 weeks, not 4-8. The plugin SDKs for Premiere and DaVinci are notoriously painful. GPU costs for indexing are non-trivial. Leveraging existing APIs (Google Video Intelligence, OpenAI vision) could speed this up but adds latency and cost.
This is the strongest signal. Every competitor either: (a) only understands audio/transcript (Simon Says, Descript, Timebolt), (b) is a standalone platform, not a plugin (Descript, Runway), or (c) is a raw API with no editor integration (Google, Muse.ai). NOBODY has built the visual-semantic search layer inside an NLE. Adobe will eventually add this, but their AI roadmap is focused on generation (Firefly) not search. This is a clear 12-24 month window.
Natural subscription model. Editors have ongoing footage to process. Per-hour indexing creates usage-based revenue on top of base subscription. Teams need collaborative features. As editors build reliance on the tool, switching costs increase. Could add premium tiers for advanced features (auto rough-cut assembly, face-based scene grouping, brand kit detection).
- +Clear gap in the market - no one has built visual-semantic search as an NLE plugin
- +Strong pain signal validated by community engagement (425 upvotes is significant for a niche tool)
- +Natural monetization path with subscription + usage-based pricing
- +Defensible through NLE integration depth and indexed footage lock-in
- +Timing is right - multimodal AI models are now good enough and cheap enough to make this viable
- !Adobe could ship this natively in Premiere within 12-18 months - they have the AI team and the distribution
- !GPU/API costs for video indexing could eat margins, especially at the $29/mo price point - need to validate unit economics early
- !NLE plugin development is notoriously brittle - SDK changes with each update can break your product
- !Accuracy needs to be very high for professional workflows - if 'remove all scenes with cats' misses one cat scene, trust collapses
- !Two-NLE support (Premiere + DaVinci) effectively doubles your maintenance burden
Automatically detects and removes silences, dead air, and filler from video/audio. Provides timeline-based editing with jump cut automation.
AI-powered transcription and editing tool that integrates with Premiere, FCPX, DaVinci. Transcribes footage and lets editors search/edit via text.
Full editing suite where video is edited like a document via transcript. AI features include filler word removal, eye contact correction, and Studio Sound.
AI-native video platform with tools for generation, inpainting, motion tracking, and some scene analysis capabilities.
Video understanding APIs that can detect objects, scenes, faces, text, and activities in video content. Muse.ai offers a hosted platform; Google offers raw API.
Start with a Premiere-only plugin (largest market share). Use OpenAI GPT-4o or Google Gemini vision API to index footage frame-by-frame at reduced resolution. Store embeddings locally with a lightweight vector DB (ChromaDB). Support 3 core commands: 'find scenes with X', 'remove scenes with X', and 'show all shots of type Y' (close-up, wide, etc.). Output as Premiere-compatible XML markers or EDL. Skip DaVinci, skip teams, skip fancy UI. Get it into 50 beta editors' hands and measure: (1) accuracy of scene detection, (2) time saved per project, (3) willingness to pay.
Free beta (invite-only, 50-100 editors) -> $29/mo individual launch with first 2 hours of indexing included -> Add $0.50/hr overage for heavy users -> $99/mo team tier with shared indexed libraries -> Enterprise tier for post-production houses with on-prem indexing and priority support -> Expand to DaVinci/FCPX after Premiere PMF is proven
10-14 weeks to MVP with beta users, 16-20 weeks to first paying customer. The bottleneck is not building it but tuning accuracy to the point where professional editors trust it. Expect a 4-6 week beta iteration period between 'it works' and 'it works well enough to charge for'.
- “Imagine a Premiere plugin where you could say 'remove all scenes containing cats' and it'll spit out an EDL”
- “you can still manually adjust”