Skip to content

Commit 10043b7

Browse files
committed
Add Chaos Engineering Post
1 parent 78d49d4 commit 10043b7

File tree

2 files changed

+74
-0
lines changed

2 files changed

+74
-0
lines changed
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
layout: post
3+
title: Chaos Engineering For Humans
4+
description: Can we apply Chaos Engineering principles to teams?
5+
categories: Leadership
6+
social_image: ChaosEngineeringForHumans.png
7+
---
8+
9+
If you came here about Chaos Engineering for your services, “it’s not what it looks like”.
10+
11+
## Chaos Engineering
12+
You might think I just said it’s not about Chaos Engineering, but I’m about to give some kind of definition of Chaos Engineering. Yes, I am.
13+
14+
Our products have become, or maybe better say, have progressed to be, a sum of multiple services and not a monolith service that tries to do everything. Core application, auth service, user service, payments service, prospects service. Name your service here.
15+
16+
The whole system should be resilient.
17+
18+
Unit and end-to-end testing should be in place for each service, but these are done in a predictable way. What I mean by “predictable way” is that we provide a specific input and expect a specific output. And this is what we test.
19+
20+
Don’t get me wrong, this is a great state. But what would be the next step?
21+
22+
Here is where Chaos Engineering comes in. The essence is that you create a situation for your whole system, for example, latency or absence of a service. You observe what happens, and then you prioritize improvements.
23+
24+
This helps increase confidence, have a more predictable system and protect customers from a total failure of your system if your 10th service down the road underperforms.
25+
26+
## Getting Closer to A Point
27+
What is a team? A group of individuals that work together have an input (a goal), a throughput (their work, processes etc.) and an output (results).
28+
29+
A team is a system.
30+
31+
When all parts work, we have great results. When something is wrong with a part of the system, then the whole system starts to show signs of latency.
32+
33+
Do you see where I’m going with this?
34+
35+
## Teams as Systems
36+
This all began when I started thinking of what would happen if someone from my team left tomorrow or a new team was created, and we had to do an internal restructuring? What would happen if a manager had a sabbatical? What if someone was on medical leave for two weeks?
37+
38+
Would that absence block the team? What could we do to remove some uncertainty? Can we get some ideas from Chaos Engineering and apply them to a team?
39+
40+
The idea is to find a way to observe when limitations arise in a team and understand where we can optimize.
41+
42+
Here are some simple metrics you can have, but of course, it varies based on each organization:
43+
* Can the team groom a feature?
44+
* Can the team deploy a feature?
45+
* If specialization is missing, can the team find a way to bypass that?
46+
* Are the initial milestones on track?
47+
* Is team morale high?
48+
* What are the main areas the team struggled with?
49+
* Can the team self-organize and communicate with the stakeholders if leadership is absent?
50+
51+
## The Organic Way
52+
There is an easy and organic way to observe and draw some conclusions without changing anything in your team.
53+
54+
Try to observe when someone has Paid Time Off or, for some reason, cannot contribute to the problem at hand. Is there sufficient documentation? Do you have a bus factor one? Having one person less, will reduce the capacity of a team, but is it in the danger zone?
55+
56+
Note down the observations, share them with the team and create action items.
57+
58+
## The More Creative Way
59+
Create the circumstances that will create some “planned turbulence” for the team.
60+
61+
Once a quarter, throw all the team’s names in a randomizer, and the winner gets one or two weeks of having to work on a project they choose or join another team. During that time, they should not interact with the rest of the team.
62+
63+
There are no secrets or sketchy areas here. The team should know the goals and the metrics.
64+
65+
## Final Notes
66+
Please remember that this is not a way to calculate individual impact, create mayhem or stress your team. It’s one more tool to create the circumstances for a team to observe and improve. Low impact initiatives might also be a good start.
67+
68+
Keep your team in the loop if you want to try this out. If they don’t feel comfortable, abort.
69+
70+
I would consider this suitable for an environment with a great culture. You need safety and team maturity. This is a next-level approach. You wouldn’t try Chaos Engineering for your system if you didn’t have a good system in place first, would you?
71+
72+
And one last thing, in case this wasn’t clear. You can apply this to any kind of team, not just Engineering teams.
73+
74+
Happy experimentation!
103 KB
Loading

0 commit comments

Comments
 (0)