marcano.io

How to Identify (and Mitigate) Your Single Point of Failure

For our non-technical readers: If I mention a technical word I’ll do my best to put a simple description in parentheses. I was in the middle of something last week when my MacBook Pro started experiencing random kernel panics (restarts or crashes without warning). As you can imagine, a number crude words escaped my vocal chords partnered with a face full of frustration.

Even though I’ve worked in technical support for a number of years, I could not identify what was causing these kernel panics. After many tests and diagnostics, I gave up and brought it to the Apple Store. It turns out, my motherboard (really important part of the computer) went bad.

Thankfully, I purchased AppleCare with my laptop, so I am still entitled to free repairs and replacements, but this replacement will take at least a week.

Not being able to run my computer for more than 20 minutes hit my productivity hard, but being without my only computer for a week would hit it even harder.

In the I.T. industry, we call this a single point of failure (SPOF), and it refers to any system or computer that is depended on, and doesn’t have any redundancies or alternatives available. In the event that it crashes or is broken, you would be rendered unable to do your job.

Local SPOFs

For the devices you own, you can do a quick audit for any SPOFs by pointing to any device plugged into a wall and ask yourself “if this stopped working, could I finish the work I need to get done today, this week, or however long as it takes to replace this item?”

If the answer is “no”, then you have identified a physical SPOF.

For example, at Rack5, we all work remotely, so devices that I would point at when I am at home are:

If my wireless router stopped working, I could easily pick up my bag and head to the coffee shop on the corner with free wifi, or the coworking space that I typically work out of. Sure, I need to get it replaced, but it won’t keep me from getting work done that day.

If my iPhone 6 dies, I still own my iPhone 5 that I use for product testing and I can easily swap SIM cards to get on with my day.

If my MacBook Pro dies… I… well… then I am totally shit out of luck! I can’t effectively or efficiently work on my business from my iPhone, and I don’t have any other computer to open up Xcode and test new versions of Key.

Maybe if I worked for a larger company and we had a physical office building, I would easily be able to waltz over to the I.T. department and have it fixed or get a replacement in the same hour, but I don’t. This was my SPOF and I experienced first hand what a lack of a contingency plan does to my productivity.

Virtual SPOFs

If you work with a distributed team like we do, SPOFs exist outside of your physical presence as well. For us, it is the collection of online collaboration tools we use to move the business forward:

These companies are larger than us, and they have extra servers allocated for redundancy purposes and teams to make sure that their products are up 99.9% of the time. While it is unlikely that any of these tools will crash, I am still going to take time to create contingency plans if a service we rely on goes down.

Because my MacBook Pro is in for repairs, and I needed to fix my SPOF, I have now purchased a MacBook Air for personal use. Sure this will be my main computer for Netflix, email, and any other personal work, but it now serves as a backup computer for work in the event that my work computer needs repairs. All of my data is backed up using Time Machine for local backups and CrashPlan for online backups, so with the new computer I can pick up right where I left off.

Point to things in your workspace, make a list of tools you use online, and start making contingency plans for any SPOFs. When you experience a crash, you’ll be happy you thought ahead.