Previous | Table of Contents | Next |
As discussed elsewhere, if you dont know what things look like when times are good, youll have no idea what youre looking for when things go bad. Accordingly, you really want to expend a little effort and create a picture of what your network infrastructure looks like when things are running pretty well.
A couple of words about the length of a baseline: Any good statistical picture must entail a large enough sample to make the data valid. In other words, the American Medical Association doesnt set normal lab values from a population sample of alcoholic, anemic computer geeksthey take large samples from healthy people from all walks of life and figure out what the normal range (highs and lows) for cholesterol, iron, white blood cells, and so on should be for most folks. If the doctor finds out that your blood has abnormal rangesand, after all, youve reported to her that you dont feel wellshell likely investigate whats causing your abnormal ranges.
The same is true of your baseline. You cant expect to take samples during your busiest time of the year and get normal values. Nor can you take a days worth of data and consider it to be gospel. Instead, you need to take at least a weeks worth of data at a time of year when its business as usual. You can graph this data and keep it for when you have problems. When you do, you take the same measurements and see which statistics jive with your baseline numbers and which do not. For example, suppose your network utilization on segment 3 never exceeds 15 percent and never has an error rate of more than 2 percent when things are normal. If you find out that its utilization is 65 percent with an error rate of 12 percent, you would probably investigate the segment some more. This is the magic of baselining.
If you have an RMON and SNMP infrastructure, you can use this to baseline your network. Companies such as NetScout and Kaspia can help you out here. Although the initial investment can be steepyou have to make sure that each of your servers, routers, switches, and applications have an SNMP agent, plus you have to expend the effort of configuring the management stationits only a one time investment, and youll be provided with baselines for a long time.
If youre not sure whether you need automatic baselining, try manually creating your statistics. Its a lot of work, but its doable a couple of times a year. Think of it as closing the store to take inventoryits a lot of work, but necessary.
Here are the two types of performance monitoring youll have to perform in order to manually baseline your network:
Why no switch or router baselines? Unfortunately, just about every router and switch is different. Whats more, the data gathering mechanisms are either proprietary oryou guessed itSNMP based.
Server Statistics
Lets say you have a UNIX system youd like to baseline. On Monday, you set up sar (System Activity Reporter) to collect performance data each day. Some systems already have data collection enabled by default. To see if you get a report, type
sar -A | more
If you dont, data collection probably isnt enabled.
You can enable sar data collection on certain UNIX systems by typing the following command:sarenableIf that doesnt work, type this:
man sarThis should tell you how to enable data collection.
At the end of each day, you take the sar output and place it in a text file. You then take the text file and import it into Lotus 1-2-3, Quattro, or whatever spreadsheet you like. Youd do this for a couple of days.
There are two schools of thought on what to do with the individual day data. I personally like to graph the entire week sequentiallyafter all, what happens on Friday doesnt necessarily happen on Tuesday. However, there are those who like to average all the data for the week.
Youll want to produce the following graphs for visual reference:
Because some implementations of sar are different, you should see the sar man page to see which abbreviation corresponds to which statistic. All the graphs should have time as the X axis so that you can see how one graph relates to another. (See the sample graph in Figure 23.3.)
Figure 23.3 A sar report converted into a Quattro Pro graph.
You can see in Figure 23.3 that user activity (%usr), system activity, (%sys), waiting for I/O activity (%wio), and idle time (%idle) all add up to 100 percent. Although youre not running out of processors on this graph, you can see that youve got a %wio problem: This is evident when you graph the paging activity; it pretty much follows the curve of the %wio. As with the vmstat example earlier, you probably have a memory and swap problem.
As far as manually gathering statistics from other operating systems is concerned, you need to know the following points:
- NetWare really requires an SNMP management station to deal with its resources; theres no way to extract the server resources manually.
- Youve already seen how cool the Windows System Monitor is. You can also use the NT Performance Monitor (PERFMON.EXE) and get it to store reports in comma-separated format, which you can easily import into a spreadsheet.
Commercial packages are also available for tracking the resources of your servers; theyre reasonably inexpensive and can save you a lot of work. For example, UNIX users should check out SarCheck by Aurora Software (www.sarcheck.com). A solution like this is a good compromise between performing resource baselining by hand and buying into a full SNMP solution.
Previous | Table of Contents | Next |