When your systems are not working like they are supposed to, it can range from a minor annoyance to a major crisis. Large, company-wide outages can be a total showstopper for customers and operations, but even a small issue can lead employees to find workarounds that lead to more severe problems. At SineWave, we’re proud of our troubleshooting process and we’re happy to share the steps we use to help our clients address whatever challenge they are facing.

  1. Define the problem. Many end-users have thrown up their hands to say, “it doesn’t work!” but in order to make real progress, the troubleshooting process must begin with a deep understanding of what is not operating as expected, how it is supposed to perform, when it began, who all is affected, and any other symptoms that seem like they might be related. By knowing what’s wrong, we have a chance of making things right.
  2. Reproduce the issue. It might seem counterintuitive, but troubleshooting is easier if the problem can be re-generated. This step is essential because being able to see the issue occur gives insight into why it happens.
  3. Identify the relevant components. Modern IT environments have dozens of abstract layers between the user and the hardware, and potentially thousands of systems constantly interacting with one another. Eliminating the irrelevant elements tells the team where to focus.
  4. Develop a hypothesis. In a moment of crisis, it is tempting to click buttons, refresh settings, or reboot systems. But good troubleshooting requires a reasonable explanation for the problem.
  5. Reduce the theory to a sequence of testable steps. You may have a big idea about what is going wrong and how it involves many different interlocking parts. But significant changes can make a problem worse. Each step should have a predictable effect, so each prediction can show if you’re on the right path.
  6. Convince the team of your idea. When things don’t work and we’re contemplating surgery on a live patient, it’s essential to talk through the plan. If you can’t get buy-in for your theory, you’re not ready to move ahead.
  7. Get approval to conduct the test. Waiting is the hardest part of troubleshooting, but it is the most important. Everyone must be in sync before any change is made. Approval creates shared responsibility.
  8. Change only one element at a time. No matter how confident you are in the many steps in your solution, don’t modify more than one component at once. Remember: change, test, certify, and repeat—until the issue is fully addressed.
  9. Demonstrate the resolution. It only works if you can prove that it works. And this means that whoever reported the problem must be the person to conduct one more test. You can only be satisfied if the user is satisfied.
  10. Document the cause, the fix, and the recommendation. Just because everyone is happy does not mean the work is done. The reason it happened should be documented. The solution that was put in place must be written down. Whatever long-term modifications are recommended must be communicated—right now, while the problem is fresh in everyone’s mind.