You can't hum while holding your nose

It’s a pleasant Tuesday morning as you sit in a meeting room with your team going over the latest round of hires. You’re digging into how Jill is doing in the mobile team, making sure she is settling in as you would expect. Suddenly, in a beautiful symphonic arrangement, the noise level in the room spikes and then drops to an intense silence.

Phones and laptops have burped whatever digital tone means ‘Hey… LOOK AT ME NOW’ and everyone now ponders whether they should look at the shiny notification or keep the discussion going.

You know how these things work at your current gig.

A mass @General message or SMS means one of two things; either the team has big news like someone has been promoted or a thing within your team is very wrong. Everyone in the room stares at you, the holder of promotional comings and goings within the organisation.

It’s at that very moment they know that nobody is being promoted.

Making your way to where-ever your technical folks reside; you gather as much detail about the incident as possible before jumping in. You are a leader in the engineering structure and you don’t want to go in cold.

Is it our Mobile apps? Is it the main website? Is it that cloud provider, again?

The floor is busy with activity; some folks who have nothing to do with the incident are happily working away… glad that for once it’s not their thing that is causing DEFCON1.

You look for the pack of folks huddled to show you where you should be. They could be gathered at a table, at a monitor or even at a whiteboard. The most important thing is they are talking about what’s going on and that is where you need to be.

They are happy to see you as they make space for you in the huddle or someone pulls you a chair to join the party.

‘So… where are we?’

This Will Happen. Alot.

Software is bound to fail; it’s just a fact of building things for a living. If you have experienced a few DEFCON incidents you will know that for all the panic, turmoil and stress during the incident… it always ends and things ultimately go back to normal.

As an engineering manager, your state of mind should try and veer towards calmness as much as you humanly can. If you can remain calm, and show your team that you trust them then they will trust you can lead them out of the crisis.

I know if no case study in history that describes an organisation that has been managed out of a crisis. Every single one of them was led. — Simon Sinek The Sky Falls Everyday

Inside any engineering team or organisation there are likely four or five issues spinning around every single day. Having a finger on the pulse gives you an overarching viewpoint of what problems your team are dealing with on a regular basis.

It’s your job to help them figure out how to fix them.

Do they need budget or time to sort the problem? Go get them it. Do they Robert from the Services team for two days? Go negotiate. Do they need help prioritizing their current work over debt? Go lead them.

This is a Good Thing

It may seem counter intuitive, but bad things happening with your software is a good thing. It may not seem good at the time, but after the incident you will have undoubtedly learned something about how your services behave in a set of circumstances.

Use that learning, document the incident and then help your team set actions to go and fix the root cause, so the risk of it happening again is lowered (Notice I never said gone).

When things go bad it’s normal to feel down, frustrated or hurt that your professional pride has been dented in some way because at the end of the day you may be to blame.

Don’t.

Just figure it out with your team so that it turns into something good.

Don’t Fix It

So you’ve been here before and you’ve seen this exact outage happen previously with your main application. You could get on your laptop, spend six minutes looking within the backend database and have everyone back in the system before you know it.

Your team would be happy. Your boss would be happy. Your customer would be happy.

Well done… you are a hero.

Nope.

It’s the job of any senior technical person within an organisation to not fix the problem on their own and instead guide the team into how the problem can be fixed.

Remember Jill? Do you think she would feel more comfortable in her new gig if she sat and watched someone fix a problem or if she fixed the problem herself being coached by the team.

Bad things happen with software every day. This is why we spend so much time working on our build, test and deploy mechanisms and culture so that when things do go bad, we can recover quickly.

As a leader within a team, it’s your job to help guide and coach a group of humans every day; from estimating stories, working on technical specs to recovering when your system dies at 3am on a Sunday.

As an engineer, you got a buzz from fixing the thing… but as a leader you need to watch others fix the thing.

At the end of the day, you can’t hum while holding your nose.