Where's the undo button? (Part III)
This is the third part of the blog series where we examine the relationship between DevOps and safety. My name is Tuomo Niemelä and I work as a DevOps consultant at Polar Squad which operates in the intersection of people and technology.
You can read the first part from here and the second part here.
Everyday safety
If the previous parts were too technical or academic, don’t worry! There’s still plenty of ways to “do DevOps” in everyday situations. I’m going to list a couple of real life examples relating to the points I listed in the previous part. While some might see these methods and ideas as obvious I still think these things are good to be said out loud.
Getting over fear of failure
One of my all time favorites happens during daily meetups. You’re going through who is focusing on what today and are there any blockers. Imagine a situation where your colleague says in a lower tone that one fix he is implementing isn’t going his way. He might even be hinting there’s something wrong with his intelligence or skills. Pause here: This is the exact moment when you need to catch the exposed vulnerability happening in milliseconds and embrace it. You could say something like “That’s ok, I feel such things hard all the time - there was this one time when…” and continue there with something which expresses your vulnerability in exchange.
Previous scenario can happen the other way around also. Once the time is right, I myself might ridicule my own doing in order to give my team members a chance to pick my vulnerability and meet me halfway. Sometimes it works, sometimes not. These things take time and some trust that people want to do the right thing most of the time.
Something about innovation and creativity
I love brainstorming! You know the situations when it’s allowed to throw ideas no matter how crazy or stupid. Sometimes your colleague catches something essential from that and synthesizes a whole new solution from new ideas combined! Maybe while conducting an ordinary planning or problem solving session we’re not allowed to be as wild as in brainstorming sessions since the amount of useless or incorrect information could distract us from the actual solution.
Still there is something to learn from brainstorming: there is more freedom to fail. When you and your teammates are starting to jell this freedom emerges into any session. Just start with the classics like “I know this sounds stupid but…” or “I know I’m an idiot but…”, works like a charm! All team members should remember the following: Don't get hung up on little details or some syntax errors right away, there is always a chance to refine the solution after.
About team learning
Sometimes there can be this stigma towards two or more people doing the same thing. Since time is money and to maximize throughput every expensive developer should do only their own thing all the time right? Wrong! If we’re going to work as a team - an antifragile team - there needs to be cooperation and knowledge sharing. It can even happen in any mundane task.
For example while doing a bigger (small is better I know) production deployment with database manipulation I might ask a teammate to join me as an extra pair of eyes. Just in case. Not only does this reduce stress by sharing the burden, it also offers us a chance to share our viewpoints about the whole process and the state of the system or tools.
It starts with communication
Sarcasm: it’s an art form. I’m sure your intention isn’t to hurt anyone. Anyway if you don’t know when it’s the right place and time to use it on people, then don’t. There is this unwritten rule for when to use sarcasm among close friends or colleagues. Just note that even though you might feel that it is okay to use sarcasm on a person, that person might not feel the same. Sarcasm is dangerous. “But that person just doesn’t understand humor!” No! Now you’re just an asshole.
Lastly there's a couple of things we need to remember: we need to stop downplaying the problems at hand and the successes in the end. Firstly, there's no such thing as a trivial problem. If there were, we wouldn't call them problems. Every person is moving in a different stage in their career path. Something which is trivial to you might not be so trivial to others. Secondly, remember to celebrate even the small wins out loud - together. Software is never going to be fully ready or perfect. Its life cycle continues to evolve after production launch. My point being: don’t ever rob people of feeling good about themselves and don’t get blinded by the continuous improvement cycles.
It comes down to trust
But how do we people build trust? That could be yet another topic on its own. In the meantime all I can say is that it usually helps if one isn't a complete asshole. Listen to people and build from there. Showing vulnerability is a leap of faith but around the right people it’s always worth it.
Once the team starts to jell together, these details mentioned earlier become more natural and automatic. I still would recommend keeping an eye on them especially while the team is new or a new team member is introduced to the pack. Also: take care of your juniors, and someday they might be your seniors. If you want to dig deeper in these topics I highly recommend a book “The Culture Code” by Daniel Coyle. Now let’s end this.
There’s no such thing as 100% safety: the unspeakable happens
In this journey I wanted to share my viewpoint that DevOps represents first and foremost safety. We’ve been examining some technological practices and solutions which one could implement into systems to gain more safety, possibly making the system more welcoming and manageable for anyone who wishes to learn it.
We went through 5 key points in the psychological safety framework and ended it all with real life examples and practical ideas. Since psychological safety is such a huge part of the whole picture, I could almost call technological safety as “everything else”-safety just for balance. Now I want you to go through one last thought experiment.
I want you to imagine a situation where you are making some major changes to the production environment. Similar changes have been going well to the test environment earlier, tests show green and you feel pretty confident that everything will go just fine. There were processes in place and you followed them. There were some safety measurements in place, but somehow by accident you went around them. Maybe some critical part of the automation like the database backup failed also. The inevitable human error happens and now you wrecked the live production environment big time beyond repair.
Here's where processes and tools end, and culture begins. How does your team and organization react to these kinds of happenings? Is there a redemption? In what company are you in? What are your values? Systems will fail but that doesn't mean people around you must also fail you.
Bonus tip: If you’re conducting “blameless postmortems” while being tense or with overly mechanical efficiency you might still be doing just plain postmortems.