What is site reliability engineering (SRE) and how is it different from DevOps?

Site reliability engineering (SRE) is Google’s model of service management where software engineers run production systems using a software engineering approach. It’s clear that Google is unique, and they usually need to tackle software bugs and errors in different and non-conventional ways. But having software engineers doing a …

Continue reading »

Why Transformational Leadership Is Needed in Every DevOps Initiative

Why Transformational Leadership Is Needed in Every DevOps Initiative

Every DevOps initiative needs support from leadership. Without DevOps leadership support, teams won’t be able to move forward smoothly. Leaders have a significant impact on results—not just with DevOps, but with every change initiative in an organization.

This time we won’t talk about DevOps as a tool …

Continue reading »

Pitfalls with DevOps at Scale

Let's get started by defining what DevOps is.

I know, I know; there are tons of definitions. But the one I like most is from Gene Kim:

DevOps is those set of cultural norms and technology practices that enable the fast flow of planned work from, among others, development, through …

Continue reading »

Getting Started Quickly With Go Logging

It's time to talk about how to get started with logging again. The languages we've covered so far are C#, Java, Python, Ruby, Node.js, and JavaScript. Today, we're going to be talking about the Go programming language, also known as Golang. Go is a statically compiled, open-source programming language …

Continue reading »

Deployment Smells: The 5 Most Common Deployment Mistakes

Deployment Smells

Deployment shouldn't be the most nerve-racking task for sysadmins when releasing software, but many times it is.

If we’re deploying application changes to a beta system, we don’t care that much if the system goes down. Neither will your users—they know that beta means “expect problems from …

Continue reading »

A Detailed Guide to Canary Deployments

A Detailed Guide to Canary Deployments

Every time we need to deploy to production, we worry about how changes will affect the user experience. No matter what technique or strategy you use to make deployments, there are going to be times when the things that can go wrong will go wrong. It’s Murphy’s law …

Continue reading »

Amplify Feedback with Continuous Performance

Amplify Feedback with Continuous Performance

As cloud becomes the norm, we’re letting others manage much of our infrastructures for us. Cloud providers offer common metrics like CPU, memory, storage, and networking so that you can stay up to date on the health of your system. And we worry less and less about those metrics …

Continue reading »

Why You Need an Error Budget and How to Make It Work

Why You Need an Error Budget and How to Make It Work

How many times have you seen Google go down? Not many, I bet. You might not even notice it if it happened. If you did, you’d probably assume it’s an internet connection problem.

But Google isn’t perfect. As Werner Vogels says, “Everything fails, all the time.” If …

Continue reading »

Choosing a Deployment Strategy: A Manager’s Guide

Choosing a Deployment Strategy

Some time ago, I was in charge of deploying to the main system in the company I was working at. Back in those days, I had the bad habit of doing deployments manually by copying/pasting assemblies and then RDPing into the servers to update it with the latest changes …

Continue reading »

Which DevOps Metrics Matter?

DevOps Metrics

Some time ago, I decided to start dieting for the millionth time. All previous tries were a complete failure. But this last time was different.

It was different because not only was I going to a nutritionist, but also every time I visited her, she took my measurements. That way …

Continue reading »