GitLab Commit Virtual is here. Register Now for our 24 hour immersive DevOps experience.
Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Direction - Incident Management

Introduction and how you can help

Thanks for visiting this category page on Incident Management in GitLab. This page belongs to the Health group of the Monitor stage, and is maintained by Sarah Waldner who can be contacted directly via email. This vision is a work in progress and everyone can contribute. Sharing your feedback directly on issues and epics at GitLab.com is the best way to contribute to our vision. If you’re a GitLab user and have direct knowledge of your need for incident management, we’d especially love to hear from you.

Overview

Downtime costs companies an average of $5,600/minute, according to Gartner. This number, though an estimate based on a wide range of companies, communicates that downtime is expensive for organizations. This is especially true for those who have not invested in culminating process and culture around managing these outages and resolving them quickly. The larger an organization becomes, the more distributed their systems and teams tend to be. This distribution leads to longer response times and more money lost for the business. Investing in the right tools and fostering a culture of autonomy, feedback, quality, and automation leads to more time spent innovating and building software and less time spent reacting to outages and racing to restore services. The tools your DevOps teams use to respond during incidents critically affect MTTR (Mean Time To Resolve, also known Mean Time To Repair) as well as the happiness and morale of team members responsible for the IT services your business depends on.

A robust incident management platform consumes inputs from all sources, transforms those inputs into actionable incidents, routes them to the responsible party, and then empowers the response team to quickly understand and remediate the problem at hand. Moreover, this platform should also guide Post Incident Reviews following the fire-fight that makes it easy for the team create and feed after-action items back into the Plan stage for continuous improvement.

Mission

Our mission is to help DevOps teams reduce MTTR by streamlining the triage and resolve workflows via tools that provide access to observability resources (metrics, logs, errors, runbooks, and traces), that foster easy collaboration across response teams, and that support continuous improvement via Post Incident Reviews and system recommendations.

Challenges

As we invest R&D in building out Incident Management at GitLab, we are faced with the following challenges:

Opportunities

We are uniquely positioned to take advantage of the following opportunities:

Target Audience and Experience

Our current Incident Management tools have been built for users who align with our Allison (Application Ops) and Devon (DevOps Engineer) personas. The experience targets DevOps teams at smaller companies where it is common for the engineers to be on-call and responding to alerts for the software that they also write code for. As we mature this category, we will evolve the experience to appeal to and serve the enterprise customer.

Strategy

Maturity Plan

We are currently working to mature the Incident Management category from viable too complete. Definitions of these maturity levels can be found on GitLab's Maturity page. The following epics group the functionality we have planned to mature Incident Management.

What is Next & Why?

Processing alerts during a fire-fight requires responders to coordinate across multiple tools to evaluate different data sources. This is time consuming because every time a responder switches to a new tool, they are confronted with a new interface and different interactions which is disorienting and slows down investigation, collaboration, and the sharing of findings with teammate. Actionable alerts and incidents accelerate the fire-fight by enabling efficient knowledge sharing, providing guidelines for resolution, and minimizing the number of tools you need to check before finding the problem. In support of this, we are pursuing the following functionality to move Incident Management to complete:

…and much more! Please follow along in this epic to contribute to our plan.

What is not planned right now

These features are currently out of scope for Incident Management and are not planned for any maturity levels at this time. This does not exclude them from future considerations.

Competitive Landscape

Atlassian Opsgenie Splunk VictorOps
PagerDuty
ServiceNOW
XMatters

Analyst Landscape

Not yet, but accepting merge requests to this document.

Top Customer Success/Sales Issue(s)

Not yet, but accepting merge requests to this document.

Top Customer Issue(s)

Not yet, but accepting merge requests to this document.

Top Internal Customer Issue(s)

Not yet, but accepting merge requests to this document.

Top Vision Item(s)

Not yet, but accepting merge requests to this document.

GIT is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license