7.7highGO

SRE Training Platform

Gamified incident response training platform for DevOps teams with team management, custom scenarios, and progress tracking.

DevToolsEngineering managers, SRE team leads, and DevOps training programs at mid-to-...
The Gap

Onboarding junior SREs on Kubernetes troubleshooting is hard — you can't break production for training, and lab environments lack urgency. Teams have no structured way to build incident response muscle memory.

Solution

A managed platform built on the open-source K8sGames concept, adding team dashboards, custom scenario builders, LMS integrations, skill assessments, and multiplayer incident drills. Sell to engineering managers who need to onboard and upskill SRE teams.

Revenue Model

Freemium — free tier with basic scenarios (like the open-source version), paid team plans ($15-30/seat/month) with admin dashboards, custom scenario editor, progress analytics, SSO, and compliance reporting

Feasibility Scores
Pain Intensity8/10

The Reddit thread validates this directly — 83 upvotes on a DevOps sub for a free game is strong signal. Onboarding junior SREs is a known, acute pain. You can't train on production, lab environments feel fake, and the cost of a junior SRE making mistakes in a real incident is measured in downtime dollars ($5K-$100K+/hour for mid-to-large companies). Engineering managers are desperate for structured onboarding beyond 'shadow someone for 6 months.'

Market Size7/10

TAM: ~$2-5B segment of DevOps training focused on hands-on SRE/incident response skills. SAM: ~$200-500M for Kubernetes-specific incident training at companies with 50+ engineers. SOM: realistically $5-20M in first 3 years targeting mid-market. Not a massive standalone market, but large enough for a strong business. The buyer (engineering manager with L&D budget) typically has $500-2000/seat/year discretionary spend for training tools.

Willingness to Pay7/10

Engineering managers already pay $20-50/seat/month for tools like KodeKloud, Pluralsight, LinkedIn Learning. $15-30/seat/month is well within range. The ROI story is compelling: if training prevents even one P1 incident per quarter, it pays for itself 100x over. Enterprise compliance/audit requirements for incident response training create budget line items. However, open-source alternatives (including your own base project) create price pressure — the paid tier needs clear differentiation.

Technical Feasibility8/10

You already have the open-source K8sGames as a foundation — that's a massive head start. The core technical challenge (browser-based K8s environments) is solved. Adding team dashboards, progress tracking, scenario builder, and auth (SSO) is standard SaaS engineering. The hard part — sandboxed K8s environments at scale — can be solved with existing tools (vcluster, kind, ephemeral namespaces). A solo dev with strong K8s knowledge can build a viable MVP in 6-8 weeks. The multiplayer/real-time incident drill feature is the stretch goal.

Competition Gap8/10

This is the strongest signal. Nobody owns the 'gamified SRE incident response training' niche. Gremlin is chaos engineering (testing, not training). KodeKloud is courseware (passive, not urgent). Killercoda is individual (no teams). Instruqt is infrastructure (build-your-own). PagerDuty is incident management (no hands-on training). The specific combination of: gamified + incident urgency + Kubernetes-specific + team management + progress tracking does not exist as a product today. This is a genuine gap.

Recurring Potential9/10

Textbook SaaS subscription model. Teams need ongoing training as: (1) new hires join continuously, (2) Kubernetes evolves with new failure modes, (3) compliance requires periodic incident response drills, (4) skills degrade without practice. The scenario library is a content moat that grows over time. Team seats naturally expand as companies grow. Annual contracts with enterprise are very achievable given the compliance angle.

Strengths
  • +Clear competition gap — no one owns gamified SRE incident training
  • +Open-source base (K8sGames) provides credibility, community, and a funnel
  • +Strong pain signal validated by Reddit engagement and real-world SRE complaints
  • +Compelling ROI story: training cost << cost of one preventable P1 incident
  • +Natural enterprise expansion: compliance, audit trails, and team growth drive seats
  • +Content moat — custom scenarios become switching cost over time
Risks
  • !Open-source cannibalization: if the free version is good enough, conversion to paid will be low. Must clearly gate team/enterprise features.
  • !Infrastructure costs: running sandboxed K8s environments per user is expensive (~$0.50-2/session). Unit economics need careful management.
  • !Long enterprise sales cycles: selling to engineering managers at 50+ engineer companies means 1-6 month sales cycles, not self-serve signups.
  • !Content treadmill: Kubernetes evolves fast, scenarios need constant updating to stay relevant and realistic.
  • !Platform risk: if a major player (PagerDuty, Datadog, or KodeKloud) adds gamified incident training, they have distribution advantage.
Competition
Gremlin

Chaos engineering platform that lets teams inject failures into production/staging systems to test resilience. Includes GameDay orchestration for team-based incident drills.

Pricing: Enterprise pricing, typically $30K-100K+/year. No self-serve plans for small teams.
Gap: Not a training/learning platform — it's a testing tool. No gamification, no skill progression, no onboarding workflows, no scenario builder for junior engineers. Requires real infrastructure to operate. Overkill and too risky for training juniors.
KodeKloud

DevOps and Kubernetes training platform with hands-on labs, courses, and certification prep

Pricing: Individual: $15-25/month. Teams: ~$20-30/seat/month. Enterprise: custom pricing.
Gap: No gamification or urgency simulation. No incident response drills — it's courseware, not crisis simulation. No multiplayer/team-based exercises. No custom scenario builder for company-specific infrastructure. Passive learning, not muscle-memory building.
Instruqt

Platform for building and hosting interactive, hands-on technical labs. Used by companies like HashiCorp, Red Hat, and GitLab for product demos and training.

Pricing: Enterprise-only, ~$30K-100K+/year based on usage. No self-serve tier.
Gap: It's an infrastructure platform, not a training product — you build your own content. No gamification, no scoring, no incident simulation engine, no team progress dashboards. Expensive and enterprise-only. Building SRE training on Instruqt is like building a house — you get the land, not the house.
Killercoda (formerly Katacoda)

Free interactive browser-based learning environments for Kubernetes, Docker, Linux, and cloud-native tech. Community-contributed scenarios.

Pricing: Free for individuals. Pro plans exist but limited. No real team/enterprise offering.
Gap: No team management, no admin dashboards, no progress tracking, no gamification, no incident urgency simulation, no custom scenario builder for orgs, no multiplayer. Content quality varies. No enterprise features (SSO, compliance). Essentially a solo learning tool.
PagerDuty Incident Response (+ Process Automation)

Incident management platform with runbooks, on-call scheduling, and post-incident learning. Offers 'Failure Friday' culture guides and incident response documentation, but not hands-on training.

Pricing: Starts at $21/user/month for incident management. Enterprise plans $41+/user/month.
Gap: Zero hands-on training capability. No simulated environments, no Kubernetes troubleshooting labs, no gamification. Their 'training' is documentation and process guides, not experiential learning. Complementary to an SRE training platform, not competitive.
MVP Suggestion

Extend K8sGames with: (1) user accounts and basic team creation, (2) a team admin dashboard showing member progress/scores across scenarios, (3) 10-15 curated Kubernetes incident scenarios with escalating difficulty and time pressure, (4) a leaderboard with scoring based on resolution time and accuracy, (5) basic SSO (Google/GitHub OAuth). Skip the custom scenario builder and multiplayer for MVP — those are v2 features. Focus on making the core loop addictive: break → diagnose → fix → score → compete.

Monetization Path

Free tier (5 scenarios, individual use, open-source parity) → Team plan at $20/seat/month (admin dashboard, all scenarios, progress tracking, SSO) → Enterprise at $35/seat/month (custom scenarios, compliance reporting, LMS integration, dedicated support, SLA) → Scale via content marketplace where senior SREs sell custom scenario packs, taking a 20-30% platform cut.

Time to Revenue

8-12 weeks to MVP with paying design partners. Week 1-6: build team features on top of K8sGames. Week 6-8: private beta with 3-5 engineering teams from the Reddit community and DevOps Slack groups. Week 8-12: iterate based on feedback, launch paid tier. First revenue likely month 3. $10K MRR achievable by month 6-9 with 15-20 team accounts. The open-source community is your built-in distribution — leverage it aggressively.

What people are saying
  • hardest parts of onboarding junior SREs
  • can't exactly break production for training purposes
  • lab environments never feel urgent enough to build real instincts
  • Zero setup cost for new hires - send them a URL on day one