Batching

We think that it is more efficient to do big changes than smaller ones, but just because we think just in terms of the development. Bigger…

mar 20, 2022

We think that it is more efficient to do big changes than smaller ones, but just because we think just in terms of the development.
Bigger changes make it more difficult to deploy to production, something is not really done until you have feedback from clients.

A batch is the unit of work that passes from one stage to the next stage in a process. The batch size the scale of that work product.
http://dev2ops.org/2012/03/devops-lessons-from-lean-small-batches-improve-flow/

The book “Accelerate, building and scaling high performing technology organizations” finds that there is a relation between how teams performs and different DORA metrics:

Multiple deploys per day are related to low lead times and fast reactions when a problem happens in production. Change failure rate is more related to our development practices.
But why is this related to small batch sizes, we can explain this by explaining the problems when you have a big batch size.

There is an interdependency between batch size, the frequency of the deployment and how hard is to deploy things in your system.
If you are deploying to production every six months, you are probably in this situation:

Big batch size, a lot of features to deploy.
Every deployment will be part of a big plan, this plan includes who are the people that are going to check manually in production that the deployment is correct.
The deployment will be done outside working hours, probably at the night or during a period of time that is not affecting users.
The people that deploy to production are not the same people that develop the features to be deployed.
The deployment time window of several hours, usually requires some communications between teams to coordinate the effort of Ops and testers.

All these symptoms of big batches are also mapped in the organization, with a lot of different roles in charge of one of those steps:

You will have a DevOps team (that’s in fact an old Ops team, but with a new fancy name), they will be the team deploying to production.
The DevOps team will be the one in charge of monitoring production, there will be tons of alerts that no DevOps understands.
You will have a QA department (that’s in fact a lot of manual testers checking things manually in their own environment).
There will be a DBa’s department in charge of changes in the database.
You will have a lot of preproduction environments, to allow each of the previous roles to test their part of the work in isolation.

From a point of view of the business, a deployment frequency of six months is usually fine if we are in a scenario where there is no competency doing things faster and with enough quality than us. Because if others are able to do what we are doing and also are giving the user better features more frequently, we could only compete with them on price, being cheaper, sometimes this is a problem. So high quality being faster to go to prod helps your company to compete with others in another field, it allows you to design your features based on client’s data and change the direction faster if needed using your product (hypothesis driven development).

If you are feeling this pain, you are deploying to prod numerous features at once in that deployment-day. You follow a big-bang approach, or everything is in production or nothing it is.
Your batch size is a set of features.

Increase frequency

When the business starts thinking that we are very slow putting in the hands of the users our new features, we start thinking to change.
The next stage is usually to do the same (using the same process to develop test and deploy features in production), but with higher frequency.
So we say that we need to do the same we did before, but every month. At that moment you will realize that your processes, culture and people are against that change.
The status-quo is against the change, because that change requires to change the company, to find better processes to allow us to go from six months to one. If you don’t change those processes, the roles' behavior, then it’s going to be difficult to achieve your goal. Some signals that you are in this stage:

Devs will have a backlog full of bugs for the next iterations because the last deployment is not working as good as everyone thought.
QA’s will create tons of tickets for defects or bugs that no one cares.
DBA’s will try to do all changes to databases by themselves, converting themselves in goalkeepers.
PO’s PM’s or Scrum Masters will complain because of the velocity of the team, and the number of bugs created in each iteration.

All these effects are because we try to do a very difficult thing, to maintain a process designed for deploying every six months and use it for deploying every month.

In fact, people in the company are feeling the same pain they felt when the big-bang release was done, but every two weeks. There are two possibilities here, the company goes back, or it changes because of all the pain generated.

Changes

To be able to increase our deployment frequency we need to change, to reduce times we need to improve the process.
But what to change?, Theory of Constraints can help us on this. The Theory of Constraints uses a process known as the Five Focusing Steps to identify and eliminate constraints:

Identify: Identify the current constraint (the single part of the process that limits to deploy to prod every month).
Exploit: Make quick improvements to the throughput of the constraint using existing resources (i.e., make the most of what you have).
Subordinate: Review all other activities in the path to production to ensure that they are aligned with and truly support the needs of the constraint.
Elevate: if the constraint still exists (i.e., it has not moved), consider what further actions can be taken to eliminate it from being the constraint. Normally, actions are continued at this step until the constraint has been “broken” (until it has moved somewhere else).
Repeat: The Five Focusing Steps are a continuous improvement cycle. Therefore, once a constraint is resolved, the next constraint should immediately be addressed. This step is a reminder to never become complacent — aggressively improve the current constraint…and then immediately move on to the next constraint.

Checking frequently our path to production to remove constraints will make us to be in a Continuous Improvement scenario, great to find better solutions for ourselves.

One feature

If you are deploying to production every-two weeks for example, you will realize that to go to the next step you need to start thinking on deploying a feature once the feature is done and not waiting for others to deploy.
This has a lot of benefits:

Your time to market for that story is reduced to the minimum, when we think something is ready it is used by clients.
To allow this, we will need to put all the roles needed to develop, test and deploy things inside a team. So our team will have more autonomy and will be able to decide more things.
Less risk than before, because it is easier to understand what is going to be inside the release.

A lot of companies stop reducing batch size here, because they don’t even think there is another step that can be done. Some pains you can feel in this stage:

High levels of tech debt, there is no time to fix tech debt.
We are only focused on features, everything else is difficult to achieve.
Big features are very hard to implement, they will be in the branch for ages and as much time is a branch diverging from master more difficult will be to merge it.
The process inside the team is divided in several parts (for example analysis, development and manual check) done by different roles inside the team. We have a better coordination by different roles, but we are still having issues to go back and forward.

One commit

The next change requires a big change of mentality, but this one allows you to decouple release from deployment. You can deploy to production every commit, so your deployment frequency will be many times per day, it’s allowing you to experiment with your product in production with your clients and react faster if something is wrong.
This is allowing you also to use feedback faster from production, for example your QA’s can do exploratory tests in production, check some defects talk with devs and solve them at the moment. To solve a bug in production you just have to commit and push the fix to main, after some minutes it’s in production.

Zero bug policy is easy to achieve in this scenario.
Hypothesis driven development is also easy to achieve in this scenario.
To solve tech debt you just need to refactor continuously, those changes will be shared faster with the rest of the team.

To allow all of this you will need to shift left quality (so moving quality when the commit is developed) not checking the whole thing manual later, hard to check commit by commit for a person. Every commit needs to be tested and to have a good rhythm of changes you will need to rely on automatic tests. You will need some kind of automatic tool that helps you to know that your commit is working (continuous integration). Manual tests are impossible in this scenario, so you have to rely on your automatic tests, your pipeline will check that your commit is working, nothing else.
Manual tests are useless to find bugs, they are a waste of time, exploratory tests are a great tool and will help us to receive feedback from production and use that feedback to improve. But exploratory tests will not be goalkeepers anymore in this scenario.

If you are able to deploy each commit to production, you will be doing Continuous Deployment.

The commit is the smaller batch size you can imagine.

El Substack de Javier López Fernández

Discusión sobre este post