Don’t bite off more than you can chew

Multitasking in complex systems usually means increase lead time.

dic 28, 2020

Multitasking in complex systems usually means increase lead time.

https://pixabay.com/illustrations/angry-businesswoman-conflict-3233158/

Software organizations tend to think that more people working in a feature will make that feature to be ready sooner. In that sentence there are two wrong parts:

The unit of work of organizations are not people, the unit of work are teams (at least in those organizations with a certain level of complexity).
Team’s priorities are not the same because they don’t have the same stakeholders.

Team as the unit of work

In all the organizations I’ve been working it is natural to create a group of people in charge of some kind of business requirements. There are different approaches but the idea of teams working on software is spread everywhere.
I like to see this like a multicore processor, there is only one processor with multiple cores that allows the processor to do different things at the same time.

In the case of teams the team is the processor and the people inside the team the cores the processor has.
One core is able to do just one thing at a specific moment, the processor sends each core the tasks to be done when the core is not doing anything. This is happening for example when the core is waiting for an IO operation, then it needs to change the context and start another task. To control when the IO operation has finished the Interruptions were created. So the processor when an interruption is sent answer to it to follow with the interrupted process. That usually means a context switch between tasks.

For real world examples, there are many steps in a process, and each step takes time. That time for each step is the Residence time, and consists of some Wait Time and some Service Time.

Developers usually work in the same way, but we have a worse ability than processors to react when the task blocked by another team was unblocked. Also, we are very bad switching context. Those two problems make us very bad doing multitasking, we usually create a lot of waiting time working in this way (even is faster to work sequentially). Too much waiting time makes us spending a long time trying to understand what was done in that task so we are very bad switching context.

We are worse than processors doing multitasknig, but in fact processors also have a limit working in this way, they have constraints, any system has constraints. If you want to know more about this, take a look at Theory of Constraints.

Inside the team the effect of too much work is shown in the number of tasks each core has (developer, pair). If we have a high number of tasks in work in progress (WIP) following the idea explained before we will last a lot to complete the whole feature, we will have a long lead time. In reality:

Lead Time = WIP / Throughput (detailed explanation)

The easiest way to reduce your lead time is to reduce WIP inside the team. This is what kanban proposes, put a limit in your WIP and focus on finishing the stories in progress.

Throughput

The other option to improve lead time is increasing throughput.

Throughput = average departure rate, number of stories per week for example.

Improving throughput is more difficult because it is related to the constraints you have. You could think that throughput can be increased by adding more people, but that’s not true. Those constraints can be:

Processes that are forcing us to work in a way because we want to add a quality to our product, but we don’t realize they are the constraint of the system.
Architecture that forces us to coordinate teams because of our culture.
Culture: if we prefer to maintain the status quo over the improvements needed to avoid the constraints.
Interaction between teams.

There are some ways of working designed to reduce the waste time. CI/CD for example reduces at minimum the waste time to release something to production.

Teams coordination

Depending on our architecture one feature can mean to coordinate several teams to be able to have it working. This coordination happens if we don’t have autonomous teams (does our architecture allows us to have them?), if we have teams that are responsible for one part of the job done but not the whole job we need it. There are plenty of examples:

Frontend teams
Backend teams
Ops teams
Database teams

In that situation if one feature means to change several parts of the system and those parts are in charge of different teams, there will be a kind of requests from one team to another. So for example in the above example:

Frontend team will wait for backend team to have the API
Backend team will wait for Database team to have the schema
All will wait for ops to have the infrastructure to go live

So our ways of working force us to coordinate between teams trying to fix the problem. As explained before, this means increase the WIP of the first team, so long lead time. Apart from the problems explained before when one dev needs to wait until the other team does its job, the time waiting can also be higher because of the different priorities different teams will have. This is normal because not all the teams have the same stakeholders, the same roadmap etc this means long lead time again.

The natural approach we took is trying to schedule the tasks with the different PO’s in order to start working in one task when the required tasks are done. This is a very bad approach, in genera scheduling problems are NP complete. You will need more time to find the correct schedule of all the important features to reduce lead time in all of them than doing the tasks (a waste of time, don’t try to do it).

Imagine the same but inside an autonomous team, then there is only one PO. The PO needs to decide how important is the whole feature compared with others and put the whole team to work on it. The problem has minimized, now depends on one team and how effective is that team to work together. For more detailed explanations take a look to “It’s the coordination stupid”.

So throughput can only be improved if we improve our processes, if we reduce the coordination between teams, if we change our architecture, so we change our culture. In general if we work to remove our constraints (our limitations).

El Substack de Javier López Fernández

Discusión sobre este post