Introducing OpenSRE — AI-Powered Incident Investigation
I'm excited to announce OpenSRE — an open-source AI SRE platform that automatically investigates production incidents using episodic memory and knowledge graphs.
After years of being on-call and manually investigating incidents, I built OpenSRE to automate the repetitive parts of incident response. The platform combines LLM agents with a Neo4j knowledge graph to understand service topology and an episodic memory system that learns from every investigation.
Key Features
- Episodic Memory — learns from every investigation, surfaces past solutions for similar incidents
- Knowledge Graph — Neo4j-powered service topology awareness and blast radius analysis
- 46 Production Skills — integrations with Elasticsearch, Datadog, Grafana, PagerDuty, Kubernetes, AWS, and more
- Multi-provider LLM — works with Claude, OpenAI, Gemini, DeepSeek, and 14+ more providers
Check it out at opensre.in or on GitHub.