Previous Table of Contents Next


Hour 17
Where Do I Start?

You’re not obligated to finish the task; neither are you free to neglect it.

—R. Tarfon, Pirkey Avot (Chapter 2, Mishna 21)

The answer to “where do I start?” could be this: Every day, try to improve things just a little bit. This is known as proactive troubleshooting in geek speak, and you got a good glimpse of what it’s about with documentation in Hour 2, “You Can’t Have Too Much Documentation, Money, or Love” and homogenization in Hour 16, “Beauty Is Consistency Deep: Saving Yourself Trouble.” (We’ll look a little more at proactive troubleshooting in Hour 22, “Who Watches The Watchmen?: Network Management Tools.”)

The best kind of troubleshooting is the kind where you stop problems before they start. However, proactive troubleshooting is a little bit like peace, love, and understanding. (What’s so funny?) This is all great and everything, and it’s definitely worth shooting for (pardon the phrase); however, no matter how much you make things better, you’ll still have to engage in reactive troubleshooting when things don’t go as planned. Just as you’ll continuously work on your proactive troubleshooting by documenting, observing your network, and planning, you’ll also continuously be reactively troubleshooting. It’s inevitable.

Even though we’d all rather avoid problems before they start by learning what causes problems and enacting policies and procedures to avoid them in the future, even as we do this, new problems have a way of popping up. Proactive and reactive troubleshooting are simply the yin and the yang of the troubleshooting game. You’ll never get done with either; all you can do is make each one less painful.

Accordingly, in this hour, we’ll reexamine the basics of how to get started with reactive troubleshooting of networking problems, whether those problems are application related, network protocol related, or physical network related. This will allow you to then further hone in on what the problem might be, based on the theory and composition behind the component.

No book can provide you with a cool head during a network combat situation. However, as this stuff is demystified for you, and you start to form your own set of troubleshooting reaction habits, you’ll start to see that whatever the spooky problem is, the source will eventually rear its ugly head if you keep plugging away. At the point at which this all becomes more common to you, your stress level during problem determination will definitely start to decrease.

Identifying the Fault Domain: “In the Beginning…”

Unless you have a crystal ball or some sort of network-management software, your first inkling that something is wrong with your network will have nothing to do with your network and everything to do with your telephone. Particularly if your organization has multiple sites or multiple network segments, you’re not always going to personally suffer during a network outage; therefore, you won’t know what’s happening unless someone (or something) lets you know.


I’ll cover network management software in Hour 22. However, for the moment, let’s assume you haven’t invested in such software and are relying on your telephone to explode at the speed of light when the network gets into trouble.

First, you should ask the person at the other end of the phone whether other users have this same problem. It may take some doing to tickle this answer out of the caller; he or she is understandably upset, may have lost work, may have a deadline, and so on. You’ll need to be polite but firm—you can’t provide help if you don’t know the scope of the problem.

If other users are experiencing the same problem, then the problem is systemic—that is, it’s a pretty safe bet that everybody’s PC hasn’t malfunctioned in exactly the same way at exactly the same time. Therefore, the answer can be found in something that all the PCs have in common—their common network “glue.” Here’s a list of items PCs are commonly connected to (in the order of “more local” to “less local”):

  Hubs
  Switches
  Routers
  Servers

If you’re lucky, the first 90 seconds of the trouble call should tell you where the problem lies, which is pretty cool. Once you know what type of problem it is, the troubleshooting takes care of itself. If you know the problem is systemic, you can sometimes practice the techniques we discussed in Hour 4, “The Napoleon Method: Divide and Conquer.” If you know the problem is local (PC related), you can relax—at least you don’t have a lot of people down. What’s more, if you’ve practiced the network consistency techniques we discussed in Hour 16, your likelihood of getting this person back up quickly is quite high.

Diving into Details

As optimistic as we’d all like to be, you won’t always run into clear-cut situations. Sometimes the network isn’t down for everybody. When the network infrastructure is not totally misbehaving, some of your calls will come from users who have gotten various illegal operation errors. Perhaps a user will report to you that he printed, but nothing came out.


Some network problems aren’t directly related to the network protocol or physical network attributes. These problems are generally thought of as application problems, whether they’re client/server programs or file and print–oriented programs. We’ll do a lot of in-depth application troubleshooting in Hour 18, “Lots of Different People in Your Neighborhood: In-Depth Application Troubleshooting.” Be sure to bring coffee and a hankering to tinker!


Previous Table of Contents Next