The Site Reliability Glossary
DevOps – DevOps is a cultural movement and set of principles and practices that informs how development and operations teams organize and collaborate in order to build, test, and deploy software faster and more reliably—objectives that drive customer value and result in a better user experience.
To achieve those objectives, DevOps practices incorporate automation and focus on continuous improvement, continuous integration (CI), and continuous delivery (CD).
Error Budget – Error budgets rely on the notion that no service is perfect, and users will tolerate some degree of error. An error budget represents the amount of error an aspect of a given service, like latency or availability, can experience before the user experience is affected to a degree that customers become unhappy. For example, if an organization has an SLA guaranteeing 99.99% availability, the error budget is .01%. Error budgets are used to help organizations determine whether they can continue to focus their efforts on new features and enhancements, or if they need to direct their efforts to work related to site reliability to remain within their error budget.
Service Level Agreement (SLA) – Service Level Agreements are commitments that organizations make to their customers around system performance, and are often tied to key metrics, such as uptime or how quickly issues will be resolved. SLAs are often written into contracts and are sometimes even financially backed.
Service Level Indicator (SLI) – If SLAs are commitments to customers for system performance, and SLOs are internal goals for system performance, then Service Level Indicators (SLIs) are actual measurements for how the system is performing. Organizations can compare SLIs— actual measurements—against SLOs—internal objectives—to determine if they need to focus work on improving performance and reliability or if they can continue to focus on new features and enhancements.
Service Level Objective (SLO) – While SLAs reflect a commitment to customers around a system’s performance, Service Level Objectives (SLOs), are internal thresholds that organizations set for system performance to make sure that they are able to meet parameters outlined in the SLA. In an organization that practices DevOps, SLOs serve as goals or commitments that development and operations agree to around system reliability. SLOs should be focused on the user experience and reflect the minimum performance necessary for a positive user experience.
Service Level Engineering (SRE) – Site Reliability Engineering is an outgrowth of DevOps that applies software engineering thinking to the operational aspects of site reliability. In practice, site reliability engineers collect data and use mathematical formulas to guide decisions about what to work on, in order to create a balance between releasing new features and maintaining and enhancing site reliability.
Shift Left – Shift left is a DevOps concept that refers to the software development practice of focusing on testing early in the development process in order to prevent issues and ensure a better customer experience when the software is initially deployed. Shift left also prioritizes continuous integration and continuous delivery (CI/CD). In CI/CD, building, testing, and deployment are automated so testing can be done quickly, early, and often
Shift Right – Shift right refers to the practice of testing thoroughly in the later stages (i.e post-production phase) of the development process. The goal of shift right is to focus on user experience and production scenarios as important metrics. Any of the issues found in this post-release testing obviously impact customer satisfaction, and serve to inform the developers on what types of changes need to be made to the software.

Back to Top