What on earth is the difference between DevOps and SRE?

Misnad Haque
7 min readSep 5, 2021

How often have you heard the term “DevOps” and wondered what on earth that truly meant? After hearing it from different people and organizations, you’d have ended up with several definitions of this concept — ranging from Operations and Development teams working together all the way to Developers doing Operations and some bluntly providing eminence to the plethora of tools available to enable it. Now add SRE into the mix and you’re in for a confetti.

About SRE

SRE or Site Reliability Engineering is what happens when you ask a software engineer to design an operations team (Benjamin Treynor Sloss & Betsy Beyer, 2016). Remember SysAdmin? Yes, the one where an administrator looks after everything that entails a production system. SRE branches from that and aims to improve upon this practice through prescriptive means by adopting the pillars of DevOps. The core focus of SRE is prevalent in its own name and that is “Reliability”. No, this does not mean you aim for 100% reliable systems as that is unnatural; failure does happen. The question is: how do we build our systems to be reliable in the face of failure? But I digress.

The purpose of this article isn’t about detailing SRE and its concepts. For that, you’re best off reading the golden book by Google themselves: Site Reliability Engineering — How Google runs production systems, O’Reilly Media). We’re here to clear up the mess between DevOps and SRE.

Varying opinions

I’ve heard many colleagues and experts from the software development industry clamor about SRE being the “next level” of DevOps. Some claim to have been running SRE all along when in truth, they still operate with support teams that have no access to the code. Then you have a few who claim SRE as something for developers and is an extension to their DevOps teams — bear in mind, these are folk who have separate Developer, DevOps and Ops teams in practice. Finally, I also encountered people who disregarded Google’s definition of SRE and claimed that it is nothing beyond enhanced SysAdmin; something that should be easy to implement and boast about.

This is what triggered me to introspect so I spent my own time researching about the topic — no, it wasn’t just by watching a few videos about SRE. I actually bought and read Google’s book. Besides, I wanted to find out about it from the people who actually coined the term, not those who interpreted it in their own way.

Let’s hear it from the professionals

So to prove a point, I ran a survey on LinkedIn hoping to view the distribution of the various ways one could perceive SRE. The question was intended to challenge each individual’s understanding of the topic.

LinkedIn poll results on SRE
Look at that distribution from LinkedIn

Responses were received from people already engaged in development, quality assurance, IT project management, SysAdmins, DevOps, cloud services and SRE’s themselves. At the time of writing, the poll had received 42 votes and the results are more or less in-line with what I expected: a confetti.

Let’s dissect the responses received and compare it with simpler software engineering concepts.

Response A: SRE is the next level of DevOps — versioning

If something is to be taken to the next level, the general idea is that you’ll use the original’s baseline as a platform to improve upon and replace the former version. This would mean that SRE is in fact DevOps, albeit a newer or enhanced version of it.

Incorrect depiction of SRE being the next version or level of DevOps
Incorrect: SRE is the next level of DevOps

As per AWS, the term DevOps by definition is the combination of cultural philosophies, practices and tools that increases an organization’s ability to deliver applications and services at high velocity. Its primary goal is to enable quicker change delivery by eliminating the boundary between developers and operations teams. SRE though, is almost laser-focused on improving reliability of the systems.

Sudip Sengupta & Muhammad Raza (BMC), 2021 summarizes it best with regards to team composition and disciplines: “Site reliability engineering mainly focuses on enhancing system availability and reliability while DevOps focuses on speed of development and delivery while enforcing continuity.” SRE can be viewed as a implementing (not instantiating) DevOps with several extensions to it. SRE takes up elements from the broad concept of DevOps and applies it in a prescriptive manner to operational maintenance for improved reliability. It co-exists with DevOps rather than replacing it as a better version. Ergo, SRE isn’t the next level of DevOps.

Response B: SRE has nothing to do with DevOpsdisjoint set

This is mostly prevalent with SysAdmins of the past coming to terms with the changing world of DevOps and SRE. If SRE is viewed in this manner, one would assume that SRE can be implemented in an environment where DevOps does not exist.

Venn diagram showing DevOps and SRE as a disjoint set
Incorrect: DevOps and SRE have nothing in common

DevOps is based on the following 5 pillars:

  • Reduce organization silos
  • Accept failure as normal
  • Implement gradual change
  • Leverage tools and automation
  • Measure everything

SRE embraces and implements each of these principles. Refer this video by Google for details. As can be seen, SRE and DevOps go hand in hand, so thinking they’re completely disjoint is incorrect.

Response C: SRE has similarities with DevOpsleft / right join

As highlighted already, SRE and DevOps share the same principles such as the need for developers and SRE’s to work without boundaries, continuous delivery, leveraging automation, etc.

Incorrect representation of DevOps and SRE in an inner join Venn diagram
Incorrect: DevOps and SRE have similarities, but SRE doesn’t implement everything

According to Google, “Class SRE implements DevOps” and they also mention that it is a “concrete class” of the latter. This means that SRE implements all the concepts of DevOps and doesn’t leave anything unutilized. If something is termed “similar”, it would imply that the two have exclusive differences as well. This is synonymous to a left or right join with DevOps and is incorrect.

Response D: SRE is a way to run DevOpsinstantiation

While the majority of responses seemed to have favored this option, sadly it can imply that SRE is an instance (object) of DevOps (class) wherein SRE becomes a running version of the concept across both development and operations. Confusion arises when referring to articles that highlight SRE as being more about the “how” while DevOps refers to the “what”. Going purely by this, one could say that SRE is a means of executing DevOps philosophies.

Incorrect depiction of SRE as a child or instance of DevOps
Incorrect: SRE is not an instance or way to run DevOps

While partially true, one must understand that SRE is merely one side of the coin and you cannot replace an entire project team’s existing implementation of DevOps with SRE. Instead, SRE merely adds to it. Remember DevOps is mostly silent with regards to how operations must be handled and instead focuses more on breaking organizational silos, CI/CD, etc. SRE on the other hand is all about operations while adapting to DevOps’ pillars.

Another way to look at it is via team composition. “An SRE team is composed of site reliability engineers who have a background in both operations and development. DevOps teams include a variety of roles, including QA experts, developers, engineers, SREs and many others” (Sudip Sengupta, Muhammad Raza — BMC). As DevOps is the broader concept, SRE cannot be implemented on its own while ignoring the rest from a non-operational standpoint.

Conclusion

Yes, none of the options provided in the poll were correct and no, I did not provide a “none of the above” on purpose — lest it serve as a bias towards the response. So what really is SRE in relation to DevOps? Perhaps this Venn diagram will describe it best in combination with the text that follows as well.

Correct: SRE implements concepts in its entirety from DevOps

DevOps as you know, is a holistic set of principles that covers the expanse of software development all the way into its delivery into production. SRE aims to focus on operations management and reliability while adhering to these guiding pillars of DevOps. Think of SRE as an operations enabler to the concept of DevOps. This implementation is mandatory in its entirety for an operations engagement to be called SRE.

The ITIL4 circle depicts an optional purely operations-centric framework of best practices that can also help with an SRE implementation as it complies with DevOps tenets. Guidance from ITIL4 may also be leveraged, however not necessarily in its entirety; it is optional after all.

Understanding what SRE prescribes is fundamental towards implementing it successfully in any organization. My advice is to avoid skating over the definitions and processes since preconceived notions are the enemy in this case. Judging by the results of the survey, it is clear that even amongst professionals, there is a significant variance in understanding what SRE truly is about. One can only imagine the chaos that would ensue if plans to implement SRE are taken lightly with these misconceptions. I hope that this article has helped clear up some of your doubts regarding the topic.

I’m still learning SRE and this was in fact, one of the first questions I had to ask myself: “what on earth is the difference between DevOps and SRE?”

--

--

Misnad Haque

INFP. TechOps enthusiast and passionate Mediterranean home-chef (see JustBaklava LK) with a keen eye for macrophotography. Avid RPG gamer.