Introduction.  Debugging is not just fixing a problem, it involves understanding the problem, finding its cause and then making changes that get the problem to be gone for good.

Understand the Customer.  The most basic and important aspect of debugging or troubleshooting is to make sure you understand what the customer is trying to do and why it is failing.  Jumping to conclusions is likely to only aggravate the interactions or lack there of.  It can be quite difficult to do this in reality. 

For example, the customer may report that their e-mail isn't working.  There are so many possible causes it can really be nearly impossible to jump to a conclusion.  Maybe the customer hasn't correctly connected their network cable or their profile has been corrupted or their link to their client e-mail program has been changed or all kinds of other possibilities.

It is also important to know the customer so that you won't be insulting by expecting too much information from them or belittling them by talking down to them after they have provided considerable insight into the real problem.

Find the Real Cause.  It is almost always undesirable to not search for the real cause of the problem.  If you find the real cause then it is likely to be much easier to find this same causality on other occasions or reconfigure setups so that this cause doesn't occur elsewhere.

It is important to be methodical or systematic about finding causes.  Form hypotheses based on evidence and investigate.  Two of the most repeatedly used approaches in debugging or troubleshooting are

  • process of elimination
  • successive refinement

Through process of elimination you can usually start with

  • potentially simpler
  • more likely to occur
  • easier to assess
  • less costly to fix
  • what has been changed on the computer most recently

potential causes.  Then you move towards more sophisticated problems.

I sometimes have to joke about some support staff I have encountered over the years at some universities.  I have had some not entirely dissimilar experiences at one university in particular.

Imagine you enter your office one morning and there are no lights.  You call up the support staff and they come right over and say, "Well it must be the light switch!  We'll swap it out real quick."  You try to interrupt to suggest a simpler solution, but you are not allowed to speak because this agent of intelligence has experience at an Ivy League school and must therefore be obeyed after saying, "This is how we always did it at ______."  They swap the light switch and still no lights. 

So their next conclusion is the wiring is bad.  Unfortunately, you are told there is no money at present for doing the required work and it will take quite some time for this to happen.  You again try to suggest a simpler option but are again thwarted by their majesty and power.

Eventually after even more ludicrous maneuvering you are able to suggest they replace a light bulb, hopefully without offending their overly sensitive ego and getting them to think they thought of it in the first place.  They may even require you to accept blame for them not having come up with this solution earlier.

Anyway, you get the idea.

The Right Tools.  Some debugging tools are hardware oriented, others are more software oriented.  Diagnostic tools can help you examine the inner workings of some devices.

You also need to make sure you stay up to date on patches that the developers and/or vendors provide.  This in itself can be quite involved considering some firms proficiency at releasing patches to their products.

The Icing.  It is always important to be constantly on the lookout for better tools.  This may best be done by attending conferences such as SAGE.  Whether it is SNMP for trying to unify management and monitoring of your networks, or a better firewall to help prevent unwanted intrusions, it is important to keep as up to date with what really works as possible.

Being buzzword compliant is usually a hindrance to truly keeping up to date with what works.  I'm confident we all have our horror stories on this topic.

It is also important to make sure you have appropriate training on the right tools to make certain

  • is often provided offsite to help you focus without distractions
  • you have awareness of the full set of features
  • you can experiment some in another firm's lab
  • instructor's can reveal additional information or clarify more important issues
  • additional training and capabilities can help your own firm develop and sell their products
  • additional training looks good to prospective employers as well as current employers

It is also important to have people that have end-to-end understanding of particular systems.  These people aren't likely to be the first person assigned to debugging tasks, but they can provide support and maybe even more when it is necessary.