As a practitioner, I spend a significant amount learning about DevOps practices and applying them at my day-to-day job. It wasn't always easy though; learning a lot of disparate techniques, platforms and workflows that felt unrelated at times. Eventually, I realized that I was missing the underlying foundation and essential concepts in the discipline. Building this strong base has allowed me to enhance how I connect and leverage all my existing knowledge. As a result, I feel that I have been improving not just the work I do, but also why I do it.
This post, then, is my attempt to explain some of the essential DevOps concepts. First, we are going to define DevOps itself, followed by a brief analysis of its main components. Then, I will proceed to explain The Three Ways of DevOps and The Four Keys concepts. Finally, I'll conclude with some extra resources in case you want to dig deeper.
Let's start by unpacking what DevOps is all about. If you ask around, you'll likely get different answers since it seems like we can't agree on a single definition for the topic. That is not a bad thing though, it just shows the breadth of areas that DevOps impacts on organizations and people.
If we analyze the definitions above, we can see that they focus on the outcomes of implementing DevOps and/or the foundations of the discipline.
- Designed for high velocity
- It is an effort to move the two disciplines closer
- Increased speed and stability while delivering value to organizations
- A software development and delivery approach
- A way of developing software, culture and a mixture of processes and tooling
Now that we have defined DevOps, it is time to discuss its main components, a.k.a known as CALMS: Culture, Automation, Lean, Measurement and Sharing.
The Culture component focuses on aligning incentives and interests among the different parts of the organization by:
- Integrating mutual trust, willingness to learn and continuous improvement
- Constant flow of information
- Having open-mindedness to changes
- Encouraging experimentation between developers and operations
The Automation component seeks to free developers and operations teams from spending too much time doing manual toil and improve the confidence in their processes with:
- Deployment pipelines with high level of automation
- Comprehensive test automation
- Build scalable processes and workflows
As another core aspect, Lean addresses the need for quick iterations and amplification of feedback loops to increase efficiency and minimize value flow breaks through:
- Minimization of Work in Progress (WIP)
- Fixing errors as earlier in the pipeline vs at the end
- Shifting left on security inputs
The ability to measure and observe the state of your infrastructure and applications is essential to make better business decisions by:
- Collecting data from key areas throughout the value chain, including application performance and infrastructure.
- Monitoring key system metrics such as business and transactions metrics and other KPIs
The sharing of knowledge and information enables DevOps organizations to amplify success and reduce the chances failures are repeated. Strategies for sharing information are:
- Centralizing common knowledge in the organization
- Share across organizational boundaries
- Having proactive communication
This leads to benefits for both developers (reduced burnout and better job satisfaction) and companies (improved performance, customer satisfaction and developers engagement)
The Three Ways of DevOps #
The Three Ways represent the set of underpinning principles from which all DevOps practices and behavior can be derived.
The First Way: Principles of Flow #
The First Way is concerned with accelerating the flow of work throughout a process.
Its key points are:
- Make work “visible”
- Limit Work in Progress (WIP)
- Reduce batch sizes.
- Reduce hand-offs between teams.
- Continually identify and elevate our constraints.
- Eliminate hardships and waste in the value stream
The Second Way: Principles of Feedback #
The Second Way works to enable fast and constant feedback cycles throughout all stages of a development cycle.
Its key points are:
- Working safely within complex systems
- See problems as they occur
- Swarm and solve problems to build new knowledge
- Push quality closer to source
- Optimize for downstream work centers
The Third Way: Principles of Continuous Learning #
The Third Way seeks to create a culture of continual learning and experimentation within the development organization.
Its key points are:
- Enable organizational learning and a safety culture
- Institutionalize the improvement of daily work
- Transform local discoveries into global improvements
- Inject resilience patterns into daily work
- Leaders reinforce a learning culture
Four Keys #
The Four Keys refers to the main metrics that indicate the performance of a software development team. At high level, we can classify these metrics based on what they measure.
- Deployment Frequency: How often successful releases are made to production
- Lead Time for Changes: The amount of time it takes a commit to get into production
- Mean Time to Restore: How long it takes to recover from a failure
- Change Failure Rate: The percentage of deployments causing a failure in production
By combining measuring these metrics consistently, it is possible to establish trends about how teams are doing and classify them according to the score into high performers, medium performers or low performers
Deployment Frequency #
Deployment Frequency is used as a proxy to measure batch size, since ideally, the more frequent your deployments are, the smaller your batch size is likely to be. By "deployment", the metric refers to a new software release to a production environment, an app store or similar. This metric helps to measure how often the business is delivering value to customers.
Lead Time for Changes #
Lead TIme for Changes how fast the team can deliver value to their customers. There are several nuances to account for here:
- Changes must be serving the customer, not hidden under a feature toggle
- Requires insights from the live system back to the monitoring system since the deployments pipelines cannot provide these data with 100% accuracy
Mean Time to Restore (MTTR) #
Successfully measuring MTTR requires knowing when the incident was created and when a deployment resolved said incident. This means that the systems in place must be capable of obtaining data from the incident management system and link it with the deployment systems through the use of labels, tags, etc.
Change Failure Rate #
Measuring Change Failure Rate depends on two things: knowing the total amount of deployments and how many resulted in a failure in production. A particular failure in production can be expressed in different ways, such as:
- Deployment pipeline failure
- Bugs detected after the deployment
- Sever performance impact after deployment
Then, to determine the metric's value, it is necessary a system that can build the relationship between builds and releases, as well as bugs and failures from the incident management system.
If you want to dive deeper and learn more about DevOps, I recommend you the following resources, which I used as the basis for this post:
- Accelerate (Book)
- The DevOps Handbook (Book)
- The Three Ways of DevOps
- Three Ways: A Principle-based DevOps Framework
- Are you an Elite DevOps performer? Find out with the Four Keys Project
- Measuring DevOps Success with Four Key Metrics
- The DevOps Phenomenon