Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Product Vision - Monitor

On this page

This is the product vision for Monitor. If you'd like to discuss this vision directly with the product managers for Monitor, feel free to reach out to Dov Hershkovitch (GitLab, Email) or Sarah Waldner (GitLab, Email or by scheduling a video call.

Landscape

The IT monitoring and management market is well-established and crowded, but also fast-changing in terms of technologies used in IT and expectations of the IT user base. For instance, the trend to move infrastructure to public cloud introduced a new category of technologies to monitor that traditional vendors did not address. This SaaS delivery model disrupted the on-prem delivery model for many existing vendors. More recently, a transition from virtualization to container-based technologies caused another wave of adjustment to what it means to monitor infrastructure. This further challenged existing vendors. These and other market trends allow new entrants (like Sentry and Datadog) to quickly capture mindshare and eclipse existing vendors (like New Relic and Splunk).

Future Vision

Put bluntly, we are building an integrated package of observability and operations tools which will, in three years, displace today's front-runner in modern observability, Datadog. We'll do that by focusing on the four core workflows of Instrument, Triage, Resolve and Improve. The most critical of these workflows is Triage and our pursuit of that workflow will follow this general workflow:

graph LR; subgraph Alert on Top Level Metrics A(Alert Generated)-->B[Business Metric out of SLO]; A-->C(Tracing error rates out of SLO); end subgraph Metrics B-->D(High level dashboard view of critical metrics); C-->D; D --> E[View Per Tech including App]; D --> F[Metrics view across infra]; E --> G[Ad Hoc Metrics Explorer]; F --> G; end subgraph Logs and Traces G --> H[Log Explorer]; G--> I[Traces Explorer]; end

Note - a user might stop at any point during this workflow and the connections (explorers, high-level dashboards) should easily be by-passed if direct known information is desired.

The following links describe our future vision for each individual workflow:

Our long-term vision is ambitious.

In the first year, we will focus our efforts on our current target application types (cloud native applications, web-apps, static sites). As a result, during that time we will not strive to be a fully turn-key experience that can be used to Monitor legacy applications. Wholesale removing a Monitoring solution is painful and a land and expand strategy is prudent here. As a customer recently explained, "Every greenfield application that we can deploy with your monitoring tools saves us money on New Relic licenses."

As curated solutions mature, we can increasingly target new application types. In subsequent years we will compete with incumbent players as a holistic Monitoring solution.

Overview

The Monitor stage comes after you've configured your production infrastructure and deployed your application to it. As part of the verification and release process you've done some performance validation - but you need to ensure your service(s) maintain the expected service-level objectives (SLOs) for your users.

GitLab's Monitor stage product offering makes instrumentation of your service easy, giving you the right tools to prevent, respond to, and restore SLO degradation. Current DevOps teams either lack exposure to operational tools or utilize ones that put them in a reactive position when complex systems fail inexplicably. Our mission is to empower your DevOps teams by finding operational issues before they hit production and enabling them to respond like pros by leveraging default SLOs and responses they proactively instrumented. GitLab Monitoring allows you to successfully complete the DevOps loop, not just for the features in your product, but for its performance and user experience as well.

Using GitLab observability solutions, users will be handed with an easy way to gain a holistic understanding of the state of production services across multiple groups and projects. When you are deploying a suite of services, it's critical that you can drill into each individual services SLO attainment as well as troubleshoot issues which span multiple services.

We track epics for all the major deliverables associated with the north stars, and category maturity levels. You can view them on our Monitor Roadmap.

North Stars

We're pursuing a few key principles within the Monitor Stage.

Instrument with ease

Your team's service(s), first and foremost, need to be observable before you are able to evaluate production performance characteristics. We believe that observability should be easy. GitLab will ship with smart conventions that setup your applications with generic observability. We will also make it simple to instrument your service, so that custom metrics, ones that you'd like to build your own SLOs around, can be added with a few lines of code.

Detect what's important

Alerting and notification services is a table-stakes expectation of APM, and Metrics solutions. GitLab will build a great experience for setting thresholds and metrics, including setting smart defaults for known metrics. We'll lean heavily on our early integration with Prometheus scheduling, notification, and alerting services. Beyond alerting, integration with chatops and incident management is also going to be important.

Visualize and triage

Visually working with time-series data is an important expectation of an observability solution. Our dashboarding solutions will include an ad-hoc data visualization which allow us to quickly build time-series based visualizations based on metrics, charting them against related metrics, and breaking them down per the field of your choice. A dashboarding system should also provide a curated UI experience for the established vendors that are clearly in the lead.

The most effective way to bootstrap usage of a new feature / solution is to expose existing users to it in the context of what they are already doing. All 3 solution areas (Logs, Metrics and APM) should incorporate integrations of each solution and a guide on how to get started. In addition to cross-linking between observability apps, a number of broader GitLab initiatives

Resolve like a pro

We want to help teams resolve outages faster, accelerating both the troubleshooting and resolution of incidents. GitLab's single platform can correlate the incoming observability data with known CI/CD events and source code information, to automatically suggest potential root causes.

Gain insights seamlessly

Continuously learning and driving those insights back into your development cycle is a critical part of the DevOps loop. The tools in the Monitor stage make it possible to gain insights about production SLOs, incidents and observability sources across the multi-project systems that make up a complete application.

Container based deployments have rapidly expanded the amount of observability information available. It is no longer possible to collate and visualize this information without automation and distillation of valuable insights which GitLab can do for you.

We'll also provide views across a suite of applications so that managers of a large number of DevOps or Operations teams can get a quick view of their application suite, and team's health.

Principles

Our north stars are the guide posts for where we are headed. Our principles inform how we will get there. First and foremost we abide by GitLab's universal Product Principles. There are a few unique principles to the Monitor stage itself.

Complete the Loop First

As part of our general principle of Flow One the Monitor stage will seek to complete the full observability feedback loop for limited use cases first, before moving on to support others. As a starting point this will mean supoprt for modern, cloud-native developers first.

Observability for those who operate

In modern DevOps organizations developers are expected to also operate the services they develop. In many cases this expectation isn't met. Whether a developer is the one operating an application or not, we will build tools that work for those doing the operator job. This means forgoing preferences, like developers to avoid deep production troubleshooting, and instead building tools that allow those who operate to be best-in-class operators, regardless of their title.

Dogfooding

Our users can't expect a complete set of Monitoring tools if we don't utilize it ourselves for instrumenting and operating GitLab. That's why we will dogfood everything.

We will start with GitLab Self-Monitoring and our own Infrastructure teams. We want self-managed administrator users to utilize the same tools to observe and respond to health alerts about their GitLab instance as they would to monitor their own services. We'll also complete our own DevOps loop by having our Infrastructure teams for GitLab.com utilize our incident management feature.

Performance Indicators (PIs)

Our Key Performance Indicator for the Monitor stage is the Monitor SMAU (stage monthly active users).

Monitor SMAU is determined by tracking how users configure, interact, and view the features contained within the stage. The following features are considered:

Configure Interact View
Install Prometheus Add/Update/Delete Metric Chart View Metrics Dashboard
Enable external Prometheus instance integration Download CSV data from a Metric chart View Kubernetes pod logs
Enable Jaeger for Tracing Generate a link to a Metric chart View Environments
Enable Sentry integration for Error Tracking Add/removes an alert View Tracing
Enable auto-creation of issues on alerts Change the environment when looking at pod logs View operations settings
Enable Generic Alert endpoint Selects issue template for auto-creation View Prometheus Integration page
Enable email notifications for auto-creation of issues Use /zoom and /remove_zoom quick actions View error list
  Click on metrics dashboard links in issues  
  Click View in Sentry button in errors list  

See the corresponding Periscope dashboard (internal).

Categories

There are a few product categories that are critical for success here; each one is intended to represent what you might find as an entire product out in the market. We want our single application to solve the important problems solved by other tools in this space - if you see an opportunity where we can deliver a specific solution that would be enough for you to switch over to GitLab, please reach out to the PM for this stage and let us know.

Each of these categories has a designated level of maturity; you can read more about our category maturity model to help you decide which categories you want to start using and when.

Metrics

GitLab collects and displays performance metrics for deployed apps, leveraging Prometheus. Developers can determine the impact of a merge and keep an eye on their production systems, without leaving GitLab. This category is at the "viable" level of maturity.

DocumentationStrategy

Logging

GitLab makes it easy to view the logs of running pods in connected Kubernetes clusters. By displaying the logs directly in GitLab, developers can avoid having to manage console tools or jump to a different interface. This category is at the "minimal" level of maturity.

DocumentationStrategy

Tracing

Tracing provides insight into the performance and health of a deployed application, tracking each function or microservice which handles a given request. This makes it easy to understand the end-to-end flow of a request, regardless of whether you are using a monolithic or distributed system. This category is at the "minimal" level of maturity.

DocumentationStrategy

GitLab Self-Monitoring

Self-managed GitLab instances come out of the box with great observability tools, reducing the time and effort required to maintain a GitLab instance.

DocumentationStrategy

Cluster Monitoring

Out-of-the-box Kubernetes cluster monitoring let you know the health of your deployment environments with traceability back to every issue and code change as part of a single application for end-to-end DevOps. This category is at the "viable" level of maturity.

DocumentationStrategy

Error Tracking

Error tracking allows developers to easily discover and view the errors that their application may be generating. By surfacing error information where the code is being developed, efficiency and awareness can be increased. This category is at the "minimal" level of maturity.

DocumentationStrategy

Synthetic Monitoring

Simulate user activity within your application, to detect problems in end-to-end workflows and understand real-world performance. This category is planned, but not yet available.

Strategy

Incident Management

Track incidents within GitLab, providing a consolidated location to understand the who, what, when, and where of the incident. Define service level objectives and error budgets, to achieve the desired balance of velocity and stability. This category is at the "viable" level of maturity.

DocumentationStrategy

Status Page

Easily communicate the status of your services to users and customers. This category is planned, but not yet available.

Strategy

Prioritization Process

We follow the same prioritization guidelines as the product team at large.

As noted above, in the short term the Monitor stage will be prioritizing (video discussion) the following:

You can see our entire public backlog for Monitor at this link; filtering by labels or milestones will allow you to explore. If you find something you're interested in, you're encouraged to jump into the conversation and participate. At GitLab, everyone can contribute!

Issues with the "direction" label have been flagged as being particularly interesting, and are listed in the section below.

Upcoming Releases

12.5 (2019-11-22)

12.6 (2019-12-22)

12.7 (2020-01-22)

13.0 (2020-05-22)

Other Interesting Items

There are a number of other issues that we've identified as being interesting that we are potentially thinking about, but do not currently have planned by setting a milestone for delivery. Some are good ideas we want to do, but don't yet know when; some we may never get around to, some may be replaced by another idea, and some are just waiting for that right spark of inspiration to turn them into something special.

Remember that at GitLab, everyone can contribute! This is one of our fundamental values and something we truly believe in, so if you have feedback on any of these items you're more than welcome to jump into the discussion. Our vision and product are truly something we build together!