Skip to main content

Error Budget

Error Budget:

In short, error budget is "some budget for failure", acceptable level of failure defined already.

The Key to SRE is to be able to balance the error budget against engineering time, development velocity and money.

The error budget is the gap between perfect reliability and our SLO.

This is a budget to be spent.

Given an uptime SLO of 99.9%, after a 20 minute outage you still have 23 minutes of budget remaining for the month (43 minutes of total downtime a month).

Sample Uptime references,


Error Budget Policy:

  • The Error Budget Policy is what you agree to do when the application exceeds it's error budget.
  • This is not "pay $$$"
  • Must be something that will visibly improve reliability.

Error Budget Policy Examples:

Until the application is again meeting its SLO and has some Error Budget:

  •  No New features launches allowed
  •  Spring planning may only pull Postmortem Action Items from the backlog.
  •  Software Development Team must meet with SRE team daily to outline their improvements.


Comments