7.8highGO

SQLDoc AI

AI-powered documentation generator that reverse-engineers intent and dependencies from undocumented SQL code

DevToolsEngineering teams onboarding new developers onto legacy SQL systems, data eng...
The Gap

Legacy SQL codebases have no documentation, misleading comments, and hidden business logic baked into procedural code — onboarding new engineers takes weeks of manual tracing

Solution

Feed in SQL procedures and the tool uses LLM analysis combined with static analysis to generate accurate documentation: what each section does, business rules encoded, dependency chains, known side effects, and a plain-English summary of the procedure's purpose

Revenue Model

Freemium — free for small procedures, $19-79/mo for teams with batch processing and integration with wikis/Confluence

Feasibility Scores
Pain Intensity9/10

This is a top-3 complaint in every data engineering community. Legacy SQL with no docs and misleading comments is universal at enterprises. The Reddit thread itself captures it perfectly — engineers spend weeks tracing procedures line by line. The pain is acute, recurring (every new hire), and currently has no automated solution. People are already doing this manually with ChatGPT, proving the need.

Market Size7/10

TAM is narrower than general code documentation but deep. Target: enterprises with legacy SQL Server, Oracle, PostgreSQL stored procedure codebases — estimated 50K-200K such teams globally. At $50/mo avg revenue per team, that's $30M-$120M addressable. Not a billion-dollar market on its own, but large enough for a strong indie/SMB SaaS business. Could expand into general database documentation over time.

Willingness to Pay7/10

Engineering teams already pay $19-40/mo/seat for dev tools (Copilot, linear, etc). The ROI story is compelling: if this saves even 1 week of onboarding per new hire, that's $3K-5K saved vs. a $79/mo subscription. Enterprise buyers have budget for developer productivity. However, some teams will just use ChatGPT manually and call it good enough — the convenience premium needs to be clearly demonstrated through batch processing, dependency mapping, and wiki integration.

Technical Feasibility8/10

Core loop is achievable for a solo dev in 4-8 weeks: SQL parser (use existing libraries like sqlglot/sqlparse) + LLM API calls + markdown/HTML output. Static analysis for dependency chains is well-understood. The hard parts — handling dialect-specific SQL, massive procedures that exceed context windows, and hallucination detection — are solvable but will need iteration. MVP can start with T-SQL or PL/pgSQL and expand. The combination of static analysis + LLM is the defensible moat and is technically sound.

Competition Gap8/10

The gap is remarkably clear. Existing tools either (a) document schema but ignore procedural logic (Dataedo, dbdocs, SchemaCrawler), (b) understand code but aren't purpose-built for SQL documentation workflows (ChatGPT/Copilot), or (c) are enterprise platforms priced at $100K+ that still don't deeply analyze stored procedure bodies (Alation, Atlan). Nobody is doing 'feed in your stored procedures, get back structured documentation with business rules, dependencies, and plain-English summaries' as a focused product.

Recurring Potential7/10

Moderate-strong. Initial documentation is a one-time event per codebase, which is a churn risk. Recurring value comes from: (1) re-running as procedures change, (2) onboarding new hires triggers re-engagement, (3) expanding to new databases/schemas, (4) keeping docs in sync with code changes via CI integration. A 'living documentation' angle with git-triggered re-analysis makes this sticky. Without that, it risks being a one-and-done tool.

Strengths
  • +Validated pain point with clear signal — engineers are already doing this manually with ChatGPT, proving demand
  • +Wide-open competitive gap — no purpose-built tool combines static analysis + LLM for procedural SQL documentation
  • +Strong ROI story for enterprise buyers — weeks of onboarding time saved per new hire vs. low monthly cost
  • +Technical moat from combining static SQL parsing with LLM analysis — pure-LLM or pure-static approaches are both inferior alone
  • +Natural expansion path from SQL docs into broader legacy code documentation
Risks
  • !One-and-done usage risk: teams document their codebase once and churn — must build 'living docs' / CI integration early to create recurring value
  • !ChatGPT/Copilot 'good enough' risk: some teams will just paste procedures into ChatGPT manually and not pay for a dedicated tool — need to clearly differentiate with batch processing, dependency graphs, and wiki export
  • !Enterprise sales cycles are slow — the teams that need this most (large legacy codebases) are often at orgs with procurement processes that take months
  • !SQL dialect fragmentation: T-SQL, PL/SQL, PL/pgSQL, MySQL procedures all have different syntax — supporting all of them well takes significant effort
Competition
Dataedo

Database documentation tool that catalogs schemas, tables, stored procedures, and lets teams add descriptions, business glossaries, and data lineage diagrams. Supports 20+ database engines.

Pricing: Starts ~$199/user/year for Teams; Enterprise pricing on request
Gap: No AI-powered intent extraction — documentation is still manual. Cannot reverse-engineer business logic from procedural SQL. Users must write all descriptions themselves. No plain-English summaries of what a procedure actually does.
dbdocs.io (by DBML/Holistics)

Lightweight, developer-friendly database documentation tool. Generates interactive docs from DBML schema definitions. Free tier available.

Pricing: Free for public docs; Pro ~$9/mo per user
Gap: Schema-only — completely ignores stored procedures, business logic, and procedural SQL. No AI analysis. Useless for legacy codebases where the problem lives inside the procedures, not the schema.
SchemaCrawler

Open-source CLI tool that generates database schema documentation, ER diagrams, and lint reports. Outputs HTML, text, or diagram formats.

Pricing: Free / open-source
Gap: Zero procedural SQL understanding. Cannot parse stored procedure bodies. No AI, no intent extraction, no business logic documentation. Output is structural metadata only — the hard problem (understanding what the code does) is untouched.
GitHub Copilot / ChatGPT (manual prompting)

Engineers manually paste SQL procedures into ChatGPT or use Copilot inline to generate ad-hoc documentation and explanations of SQL code.

Pricing: ChatGPT Plus $20/mo; Copilot $19/mo individual
Gap: No batch processing — one procedure at a time. No dependency mapping across procedures. No persistent documentation output. No integration with wikis/Confluence. No static analysis layer to catch what LLMs hallucinate. Context window limits choke on large procedures. Not purpose-built — requires prompt engineering each time.
Acryl Data (DataHub) / Atlan / Alation

Enterprise data catalog platforms that provide metadata management, data lineage, governance, and search across data assets. Some have added AI features for auto-descriptions.

Pricing: Enterprise only — typically $50K-$500K+/year
Gap: Overkill and unaffordable for the specific problem of documenting stored procedures. Lineage is typically at the table/column level, not inside procedural logic. AI features describe tables, not business rules inside SQL code. 6+ month implementation cycles. Not accessible to small/mid-size engineering teams.
MVP Suggestion

CLI tool + simple web UI. User uploads or points to a directory of .sql files (or connects to a database to extract procedures). Tool runs sqlglot/sqlparse for static analysis (dependency graph, table references, parameter tracking) then sends structured chunks to an LLM API to generate: (1) plain-English summary, (2) business rules detected, (3) side effects and dependencies, (4) section-by-section breakdown. Output: Markdown files per procedure + a dependency graph visualization. Start with T-SQL (SQL Server) — that's where the worst legacy pain lives. Skip Confluence integration for MVP; Markdown export is enough.

Monetization Path

Free tier: analyze up to 5 procedures, basic summaries → $19/mo Individual: unlimited procedures, dependency graphs, Markdown/HTML export → $49/mo Team: batch processing, Confluence/Notion integration, shared documentation portal → $79+/mo Enterprise: SSO, on-prem/VPC deployment (procedures contain sensitive business logic), custom SQL dialect support, API access

Time to Revenue

4-6 weeks to MVP, 8-10 weeks to first paying customer. The target users are actively in pain today — post the MVP on r/dataengineering, r/SQL, and Hacker News with a demo video of a gnarly undocumented stored procedure being documented in seconds. Early adopters will convert fast because they're already doing this manually.

What people are saying
  • there is no documentation
  • comments will lie/mislead
  • This is what AI was made for
  • build a dependency map offline
  • newbie programmer with my first job — having trouble understanding