Skip to main content

What is SRE

What is SRE?

Site Reliability Engineering originated at Google in 2003.
A framework for operating large scale systems reliably.
SRE own running the system in production at an operational level.

 Fundamentally, it's what happens when you ask a software engineer to design an operations function. So SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise, and have the ability to, substitute automation for human labor.


What is day-to-day work and responsibilities of an SRE team?
 In general, an SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, oncall support, and capacity planning.

Many operations teams today have a similar role, but the way that SRE does it is quite different. 


What is CRE?
Customer Reliability Engineering, is the concept of taking SRE principles and helping the Cloud Customers operate Reliably at scale.

Comments