Previous Table of Contents Next


Routing Problems

Let’s say that you can get to certain Web sites but not others—or your email is having trouble getting to some sites but not others. The best thing to do in this case, after making sure you can resolve the name, is to traceroute to the site’s IP address.

For example, our library system was having trouble reaching a certain Web address. I used traceroute on the IP address and found that my traceroute packet was being chucked back and forth between two routers repeatedly. This routing loop definitely indicated a routing problem, but it was hard to know who was responsible for it.

I reported the problem to the library’s ISP, but also wanted to let the folks who were responsible for the routers know about it. How do you know who’s responsible for a router? Here’s comes nslookup again!

Each IP address on the Internet should (but does not always) have a corresponding DNS name in a special zone called in-addr.arpa. This is so you can quickly resolve an IP address to a DNS name using a special kind of record called a pointer, or PTR. If you check the SOA for the DNS name—or for the network number—you can frequently find out the responsible party for the address. The only catch is, you need to enter the address in reverse. This is to make it convenient for the DNS zones. However, don’t worry too much about it—the important part is that you need to enter the addresses backward. For example, let’s say that the two routers that were looping were 192.168.1.10 and 192.168.2.5. During my troubleshooting session, I fired up nslookup, as follows:

$ nslookup
> set type=PTR
> 10.1.168.192.in-addr.arpa
Server: dns.frob.com
Address: 209.52.182.122

10.1.168.192.in-addr.arpa  name = router10.foo.net
> 5.2.168.192.in-addr.arpa
Server: dns.frob.com
Address: 209.52.182.122

5.2.168.192.in-addr.arpa  name = router5.foo.net

> (Ctrl-D)
$ whois foo.net
[rs.internic.net]

  Registrant:
John E. Monster (FOONET-DOM)
P.O. Box 4242
Indianapolis, IN 46219

Domain Name: FOO.NET

Administrative Contact:
Monster, John E. (JEM12) monster@FOO.NET
317-555-1400 ext. 5066 (FAX) 317-555-1800
Technical Contact, Zone Contact:
Monster, Joey (JM48) joey@FOO.NET
317-555-1400 ext. 5067
Billing Contact:
Monster, John E. (JEM12) monster@FOO.NET
317-555-1400 ext. 5066 (FAX) 317-555-1800
Record last updated on 07-Aug-98.
Record created on 25-Sep-97.
Database last updated on 29-Sep-98 08:19:55 EDT.


You can find out responsibility information for any zone using the whois utility. If you don’t have UNIX, there are Windows utilities that offer the same functionality—for example, Internet Anywhere Toolkit (www.tnsoft.com).

Okay! I’ve got all the information I need to report these shenanigans! Foo.net owns both of these routers, and the whois for foo.net provides an email address for the technical person responsible for this zone. I emailed both my ISP and joey@foo.net, gave them the traceroute output, told them what IP address I had done the traceroute from, and the problem cleared up in a matter of hours. Sometimes, you’ll get a friendly letter back telling you what the problem was—other times, you’ll be greeted with stony silence. I’ve had it go both ways, but at least I could tell the folks at the library what the problem was and that I had reported it.

Intranet Troubleshooting

If you have a complicated intranet with many DNS zones, you can troubleshoot it much the same way that you troubleshoot an Internet problem—with nslookup, traceroute, and ping. Similarly, the applications that make an intranet run are similar in nature to the applications that you’ll run on the Internet, so you can apply the techniques discussed in the following sections to your Internet servers as well.

Your Web Server

My favorite Web server troubleshooting technique (which I mentioned briefly in Hour 18) is to telnet to socket 80 of a Web server and see whether I can use the GET HTTP command. (Most times, this will make the server shoot the main index page my way.) If I can, that means the server is responding to HTTP requests, and users should have no trouble accessing the server to get HTML pages. This is a good indication that the network between you and the Web server is okay.

Most Web server problems fall into three categories:

  Reliability problems
  Third-server problems
  Network problems

We won’t discuss network problems here—they’re addressable using the techniques we discussed in the “Cyber Chaos” section, earlier, as well as in Hour 17, “Where Do I Start?” and Hour 18.

Reliability Problems

Reliability problems fall into two subcategories:

  Capacity problems (server overload)
  Crash problems (server dies with a blue screen, kernel panic, and so on)

If your server is truly so popular as to be overloaded (really unlikely unless you provide an incredibly popular service on the Internet), you’ll first want to check your server log files. Are the hits that you’re getting “official” hits? Or has one of your users posted a non–work-related (but very popular) Web page? Overload is really unlikely on an non–Internet-connected server. Most Web servers can handle hundreds, if not thousands, of users without a problem. It’s when you start to get hundreds of hits per second that you really have a problem.

If you really want to be that popular, you’ll have to collect your log files and involve your server vendor and ISP, and you’ll probably have to upgrade two things: your Web server and your Internet connection. Based on the amount of Web data shown to be transferred in your logs, these two vendors will make recommendations about how you can upgrade.

It’s more likely, particularly for a non–Internet-connected intranet server, that your problems will be crash related. Take heart—most reliability problems are revision related or related to other software on the server. Make sure to get the latest version and/or patches for whatever Web server you use, as well as to use divide-and-conquer and rule-out methods on other services that run on the server.

For example, I ran an NFS server on one NT server in conjunction with a Web server. The server kept dying with the blue screen of death on a regular basis—that is, until I removed the NFS server from it and replaced it with an NFS server from a different vendor.


Previous Table of Contents Next