Using a single expensive model for all agent steps wastes money; using a single cheap model risks failures on hard steps.
Classify incoming agent tasks by complexity and route to the optimal model — cheap models for simple decisions, expensive models only when needed. Learns from benchmark data and production outcomes.
Subscription based on routing volume, with savings-share pricing model
The pain signals are concrete and quantified: '180x more expensive', '$0.20 vs $36/run'. Companies running high-volume agent workflows are hemorrhaging money using frontier models for trivial subtasks. A customer support agent doing 10K runs/day at $36/run vs $0.20/run is the difference between $360K/day and $2K/day. This is a hair-on-fire problem for anyone at scale.
TAM for LLM routing/orchestration is estimated at $500M-$1B today, growing to $3-5B by 2028. The agent-specific subsegment is smaller but growing fastest — agentic AI is the hottest category in enterprise AI. However, the target audience (companies running high-volume agentic workflows) is still relatively small today, maybe 5-10K companies worldwide. This will expand rapidly but you're early.
Strong WTP because the value proposition is direct cost savings with measurable ROI. If you save a customer $100K/month on LLM spend, charging $5-10K/month is trivial. Savings-share pricing (take 10-20% of savings) is especially compelling — customer only pays when they save. Martian raised $32M proving VCs believe in this WTP. Enterprise budgets for AI infrastructure are expanding.
A basic complexity classifier + routing layer is buildable in 4-8 weeks. BUT the hard part is the learning loop — building a classifier that accurately predicts which model handles which agent task requires substantial benchmark data and continuous tuning. RouteLLM shows the research is there, but productionizing it at low latency with high reliability is non-trivial. The agent-specific angle (understanding multi-step workflows, not just individual prompts) adds significant complexity. Solo dev can build MVP, but a competitive product needs more.
The specific gap is clear: existing routers (Martian, Not Diamond, RouteLLM) route individual prompts but don't understand agentic workflows. They can't reason about step dependencies, accumulated context, or which steps in a pipeline are critical vs trivial. Gateway products (Portkey, LiteLLM) have no ML intelligence at all. Nobody has combined ML-driven routing + agent workflow awareness + production learning loop. However, Martian ($32M funded) could add agent features quickly.
Perfect subscription fit. Agent workflows run continuously, generating ongoing routing volume. Usage-based pricing (per routed request) or savings-share naturally recurs. Switching costs increase as the router learns from a customer's specific workflow patterns. The more data it sees, the better it routes, creating a compounding moat. This is inherently a recurring infrastructure cost, not a one-time purchase.
- +Quantifiable, direct-ROI value prop — customers save measurable dollars from day one, making sales easy
- +Agent-specific routing is an unoccupied niche — existing routers treat prompts as independent, missing the multi-step workflow intelligence
- +Strong network effects: more routing data → better classifier → more savings → more customers
- +Savings-share pricing model eliminates buyer friction — customer only pays when they provably save money
- +Timing is ideal: agentic AI is exploding while cost pressure is mounting, and no incumbent owns this intersection
- !Martian ($32M, ex-DeepMind team) could pivot to agent-aware routing within months and crush you with resources
- !Cloud providers (AWS Bedrock, Azure AI, Google Vertex) adding native routing could commoditize the space
- !Cold start problem: classifier needs substantial data to route well, but early customers generate little data — chicken-and-egg
- !Model landscape shifts rapidly (new models weekly) — your classifier must continuously retrain or routing decisions go stale
- !LiteLLM's open-source dominance as the abstraction layer means you may need to integrate with it rather than replace it, limiting your surface area
ML-driven model router that uses a trained classifier to predict the best LLM for each request, optimizing for cost, latency, and quality. Founded by ex-Google Brain/DeepMind researchers. Raised $32M Series A.
Full AI gateway platform with routing, fallbacks, load balancing, caching, guardrails, and observability for LLM applications. Supports 200+ LLMs. SOC2 compliant.
Open-source proxy server and Python SDK providing a unified OpenAI-compatible interface to 100+ LLMs. Includes basic routing, load balancing, and budget management. YC S23.
ML-powered model routing service using a trained classifier to select the best LLM per query. Open-source router available. Similar approach to Martian but earlier stage.
Open-source research framework for binary routing between a strong model and a weak model based on query complexity. Built by the Chatbot Arena team with world-class preference data.
Build an open-source middleware (LiteLLM plugin or LangChain integration) that classifies agent task steps as simple/medium/complex using a lightweight BERT classifier trained on public benchmark data. Route to 3 tiers: cheap (Haiku/GPT-4o-mini), mid (Sonnet/GPT-4o), expensive (Opus/o3). Ship with a dashboard showing cost savings per workflow. Target LangGraph and CrewAI users first — they already have multi-step agents and feel the cost pain. Offer a hosted version with a learning loop that improves routing from production outcomes.
Free open-source plugin (adoption + data flywheel) → Hosted pro tier at $99-499/month with learning loop, analytics, and team features → Enterprise tier with savings-share pricing (take 15-20% of documented savings) and custom model routing → Platform play where you become the routing intelligence layer embedded in every agent framework
4-6 weeks to open-source MVP with basic routing. 8-12 weeks to hosted pro tier with dashboard. First paying customers at week 10-14 if you target LangChain/CrewAI Discord communities and AI Twitter. Savings-share enterprise deals take 3-6 months due to procurement cycles. Expect $1-5K MRR by month 4, $10-30K MRR by month 8 if execution is strong.
- “That's 180× more expensive”
- “$0.20/run vs $4.43/run vs $36/run”
- “best cost-to-performance ratio”