Sinister Light | Implementing SL{A, I, O}s

Fri, Mar 19, 2021 2-minute read

This document was written by me, deba and prashant wrote this document for observability whitepaper written by CNCF’s sig-observability group.

Purpose

Objective measure : Lets you measure quality of service and customer happiness
Common Grammar: Provides common language between business, engineering and product folks
Solves prioritisation: Provides quantitative measure of when to prioritise feature delivery vs engineering reliability tasks
Accountability: Everyone understand business consequences of breaching the SLO

Service Level Indicator (SLI): An SLI is a service level indicator—a carefully defined quantitative measure of some aspect of the level of service that is provided.
Service Level Objective (SLO): An SLO is a service level objective: objective for how often you can afford for it to fail. a target value or range of values for a service level that is measured by an SLI
Service Level Agreement (SLA): a business contract that includes consequences of violating the SLO. This is a targeted percentage
Error budget: tolerance for failed events over a period of time determined by SLO. This is 100% minus the SLO

In order for a proposed SLO to be useful and effective, you will need to get all stakeholders to agree to it.
The product managers have to agree that this threshold is good enough for users—performance below this value is unacceptably low and worth spending engineering time to fix.
The product developers need to agree that if the error budget has been exhausted, they will take some steps to reduce risk to users until the service is back in budget.
The team responsible for the production environment who are tasked with defending this SLO have agreed that it is defensible without Herculean effort, excessive toil, and burnout—all of which are damaging to the long-term health of the team and service.