"Make frequent backups" has become something of an oft-ignored litany in the world of computer publications. Countless authors have tried to emphasize the importance of frequent and reliable backups. When administering a network, however, backing up your systems becomes less a matter of convenience and more of an absolute requirement, if you intend to keep your job for any length of time.
Unfortunately, many erstwhile administrators learned this lesson in the same way that a student driver learns to drive a stick shift, that is to say, by stalling out in traffic. One day, you find yourself face-to-face with the boss, being told that the new guy in Accounting just deleted the master database file, and would you please restore it from yesterday's backup? No matter what the reason why not, if you don't have that file, you're about to learn a lesson that will remain with you for the rest of your brief career.
Backing up a stand-alone PC is more often a matter of preserving a carefully tuned environment, rather than protecting vital data, which can very likely be stored on a couple of floppies. The worst-case scenario would be having to reinstall and reconfigure all of the machine's applications. But, multiply this situation by dozens or hundreds of workstations, and add several servers full of vital company data, and the picture changes considerably.
There is no greater career pitfall than having to tell your supervisor that work needs to be redone because there is no backup. All of the computer jargon that you may have learned to throw at management up until now will be useless. They don't care why. At this point, they only see the dollar signs of lost production, as compared to the dollar signs of that expensive tape drive.
On a more serious note, it may not even be solely a matter of time or money. As more and more critical industries convert to client/server systems for their computing needs, the loss of data could lead to the interruption of vital services or, in a medical situation, even the loss of life. I can therefore safely add my voice to the chorus and say, with even greater emphasis, "Back up well, and back up often."
This chapter helps you toward that end by discussing the various criteria that should be considered before purchasing a network backup system. We will also discuss the general nature of network backup software, the configuration options available to help you devise a responsible long-term backup rotation scheme, some of the more common problems affecting network backups, and solutions that can help you troubleshoot any problems that may arise. The hardware used for backup systems is covered elsewhere in this book. SCSI and other peripheral interfaces are discussed in chapter 5, "The Server Platform."
Unfortunately, making backups is not only more crucial on a local area network, it is also far more complex. It would obviously be impractical to furnish each workstation and file server on a network with its own means for backup. After all, sharing resources is what networking is all about, right? It is therefore a relatively simple matter to decide that there should be one or more backup drives located somewhere on the network, managed from a central interface. This is the last simple decision in the process.
The questions begin to come thick and fast at this point. "What kind of backup device should I buy?" "Where should the backup device be located?" "When should the backups be done?" "How can I make sure that they are done frequently enough?" And a dozen others. It is important to realize that most network backup systems utilize virtually every system that is part of the network, often stretching most of them to the limits of their capacity. There will be NLMs or services that run on the file server, a client program that runs on a workstation, and a level of data transfer between the two that greatly exceeds that of normal use. In short, there are a great many variables that must be considered in the process of backing up a LAN, and only you can determine which are the most important to the way that you, your department, and your company work.
In this chapter, we will first attempt to identify the right questions to ask in order to design and configure a centralized backup solution that is catered to your network's needs, and then discuss several possible network backup strategies in an attempt to answer them.
The first question is the easiest. Whenever possible, the best time to plan a backup strategy is while the network is being designed or upgraded. Unlike many backup systems designed for use with single PCs, which can easily be retrofitted to an existing installation and moved to a different machine if necessary, network backup products are designed to be completely integrated into the network environment, particularly on the hardware level. SCSI has become the de facto standard for network storage subsystems, primarily because of its ability to effectively process simultaneous requests from different sources to multiple devices.
There was a time when tales of SCSI device incompatibility were rampant, and many administrators ended up with file servers containing several SCSI adapters, each addressing a single device, because of difficulties in getting the devices to work together on a single bus. The current generation of SCSI hardware, though, has come a long way in addressing these problems which, truth to tell, were probably caused as much by improper configuration as they were by badly designed hardware. There should be no problem in running today's tape drives on the same SCSI bus as hard drives, CD-ROMs, and other devices, as long as all of the devices are purchased with an eye towards interoperability.
The majority of problems that arise in setting up backup systems come from attempts to add additional hardware to an existing system without fully researching the compatibility of the devices. It is essential that devices that are to reside on the same bus be compatible. Otherwise, a tape drive that has been band-aided onto an existing SCSI bus can cause problems with existing hard drive systems, leading to network interruptions and angry users.
This situation is particularly common when attempts are made to integrate the cutting edge of technology with older, legacy equipment. I encountered one case in which a person bought a brand new Pentium-based file server and a top of the line 4mm DAT tape drive, and then proceeded to stick his old ISA PIO-based SCSI card into the server. He then wondered why his hard drive volumes would frequently dismount and his backups would run so slowly, if at all. For the want of a new $300 SCSI adapter, his $15,000 investment was going to waste.
Therefore, when adding a backup system to an existing network or file server, the single most important factor is compatibility. Before you even begin to shop for products, you should have prepared a complete list of all of the hardware that you already possess, and everything about that hardware that might be significant. If you have a properly documented network, this process is already done. If, as in most shops, you do not, you will never regret spending a Saturday afternoon opening up your file servers and identifying everything inside. Gather the model numbers of all of the components and, where applicable, the firmwares being used as well. Hard drives, SCSI adapters, tape drives, and motherboards all have firmwares or chipset specifications that can be crucial in determining the compatibility of various devices. While you have the case open, take down the serial numbers of everything as well, in case of theft. If you have ever been in the position of taking over for a LAN administrator who has left a company without maintaining such a list, you will appreciate its value and, hopefully, be a little kinder to the next guy in your job, okay?
You may find, as a result of this process, that some of the hardware in the server from which you intend to back up your network is older or less powerful than you thought it was. In cases like this, it may actually be a wiser decision to add another SCSI card to a server for connecting to a tape drive rather than add to the burden of an existing SCSI bus that cannot readily be replaced at this time. It may even be time to think about purchasing an additional machine to augment or replace an aging one, rather than jeopardize the functionality of your existing systems.
The best possible way to create an effective backup system for a local area network, however, is to integrate the backup device into a fully realized storage subsystem that has been designed in consideration of the company's data storage needs. Answering a few vital questions at the outset of the project can save considerable time, money, and aggravation. As you decide how much disk space your network will need, you should also consider the type of data to be stored, how much there is of it, how volatile the data is (that is, how often does it change), and how much time is available to back it up. These are the kind of questions that should influence your purchasing decisions regarding backup media, hardware, and software.
Where Will Applications and Data Be Stored? Successfully organizing a network is a task that does not conclude with the signing of the purchase orders. One must also plan the way in which the network is going to be used, and creating a network backup strategy must be fully integrated into that process. Storage needs can be broken down into two basic categories: applications and data. Purchasing decisions concerning backups should be made after it has been decided where the applications will reside and where the data will be stored. Backing up one hundred copies of a word processing application installed to individual workstation hard drives is quite a different proposition from backing up one shared network copy of the application. The difference between the two should heavily influence the decision as to what the capacity and speed of the backup device(s) should be.
Remember also that restoring an application that has been lost due to drive failure is not simply a matter of the cost of the software itself. You must consider the amount of time and effort needed to restore the network to its previously functional state. Multiple copies of an application are likely to have been tweaked by their users to a state in which they are most useful to each individual. This may involve a significant amount of time and labor to re-create. Therefore, the nature of the applications (and their users) must also be considered when deciding whether or not to back them up. Devoting an extra hour or two of backup time each night to files that could take dozens of hours to restore manually is well worth the tradeoff.
In the same way, if data is to be stored on the workstation hard drives, as opposed to those of the file server, then the frequency of workstation backups is affected. This may influence the purchasing process towards a software package that has more extensive support for workstations running different operating systems or one whose cost includes a license to back up an unlimited number of workstations. Data is usually far more costly to replace than applications, so policies should be established and enforced that dictate where data files will be stored by users. Backup routines can then be customized to accommodate these policies. It is only by asking questions such as these and carefully planning your network that an intelligent determination can be made as to your backup hardware and software needs.
How Much Disk Space Is There? The next question to ask when considering backup needs is how much total disk space there will be on the network. Even this question, though, gives rise to many other questions. In most businesses, it will not be necessary to back up every byte on every drive throughout the network every day. There is, therefore, usually no need to purchase a tape backup system with the storage capacity to back up the entire network onto one tape. After arriving at a total quantity of disk space for the entire enterprise, an attempt should be made to prioritize the available storage space in order to determine what the daily backup requirements will be. These daily backups should be able to be conducted as one unattended operation. If there is too much data to fit onto one tape, then multiple tape drives or a tape autochanger may be called for.
Data files, of course, should be given first priority, and should be backed up at least once a day, depending on the type and amount of data. Priorities should also be established as to what quantity of data files change on a daily basis, as opposed to others that might be accessed often, but remain unchanged for long periods.
Applications, as a general rule, do not need to be backed up as often as data files, but while executables do not change, many applications maintain configuration files, properties, and definitions that can be as volatile as the data files themselves. These files should be backed up more often than the applications themselves.
Another consideration is the physical location of the hard drives on the network. The original concept behind the local area network was to have several file servers spread hard drive space throughout the enterprise, thus allowing for system redundancy. Many shops, however, are now adopting the "super-server" concept; setting up a single server with a huge amount of disk space in one storage array instead. Conditions such as these should be accounted for when attempting to estimate the length of the backup window, due to the variations in backup throughput of various devices on the network.
The backup window is the amount of low usage time that your company's operating schedule makes available for backup jobs to be conducted. As shown in figure 17.1, the fastest and most efficient backup operations will occur when file server hard drives are backed up to a tape device on the same SCSI bus. Next in overall throughput would be remote server drives being backed up over a high bandwidth backbone such as FDDI, then server drives being backed up over a standard Ethernet or Token Ring LAN connection, and then workstation drives over the LAN connection. Using a tape drive connected to a workstation reduces the backup rate of all targets (except the drives within that workstation) to the slowest of these figures. Obviously, the more data there is to back up and smaller the backup window, then the faster the backup throughput must be.
Another wrinkle to the equation is increasing use of WAN links to interconnect remotely located networks. Even a fully dedicated T1 link provides only a 1.55 Mbps maximum transfer rate, as opposed to the 10 Mbps rate of standard Ethernet. For this reason, it becomes increasingly difficult to back up significant amounts of data over a WAN link within the limitations of a given backup window. Although a backup of selected data files from a remote office can be more economically efficient than the purchase of another entire backup system, only actual testing will tell just how much throughput can be achieved over a given link. You should not, for example, count on a backup rate of 1.55 M/sec over a T1. Other network traffic must be considered, and the specifications of the equipment usually yield estimates that are optimistic, to say the least.
Figure 17.1
These are backup targets and their relative throughputs.
Where Will Users Store Their Data, Servers, or Workstations? One of the important factors in assessing a network's backup needs is whether or not workstations need to be backed up, and if so, how often. As a general rule, it is wise to have users store all of their data on file servers, rather than workstation drives. This helps to protect their files against the possibility of workstation crashes, accidental deletion, or theft. However, this does not necessarily mean that the workstations need not be backed up. In a case such as this, the frequency of workstation backups should depend on the company standards for the workstation environment.
If an organization has standardized on a particular workstation configuration and allows no individual software selection by users, then a master workstation replica can be stored on a server, and workstation backups can be omitted, except perhaps for Windows INI files and other unique configurational elements. Even a company that allows its users complete freedom of choice in software selection and configuration, however, should consider that backing up hundreds of identical DOS directories and useless Windows swap files is a waste of time, bandwidth, and media.
How Much Time Is There To Do It In? Whenever possible, system backups should occur at times when network usage is at its lowest. While, in some firms, this backup window may stretch from 5 p.m. until 9 a.m. the next morning, allowing plenty of time for backups, flextime hours, and multiple shifts, can in many cases drastically reduce the amount of time available to back up the network. This is a crucial factor influencing purchasing decisions of both hardware and software. A faster tape drive (or multiple tape drives) may be called for when there is a large amount of data to be backed up in a small amount of time, and the capabilities of a software package's scheduling features should be considered to ensure that backup jobs can be configured to automatically begin at the correct time, only on the days that you want them to.
Where Should the Backup Device Be Located? Once it has been determined how much data there is to be backed up, where that data is located, and when the backups should occur, the next decision concerns the location of the backup device and what system it is connected to. This is a crucial factor in terms of convenience, security, and especially cost. All of the repercussions of each option should be considered in relation to the way in which the network has been designed. Essentially, there are three possible solutions: a workstation-based tape drive, a drive attached to a production file server, and a dedicated backup server. The pros and cons of each method must be considered, as there is no definitive solution that will fit all cases. These methods are discussed in the following sections.
There are several advantages to a workstation-based backup solution, the most prominent one being cost. There are several good workstation-based backup software packages available that are capable of backing up multiple file servers, and even other workstations, when they are made accessible through a peer networking protocol such as Windows for Workgroups' NetBEUI. In fact, these packages can usually back up any drive that can be mapped to a workstation drive letter. Software such as this is usually quite inexpensive, costing anywhere from $100 to $300, approximately 10% of the cost of an average server-based software package, which is usually priced according to the number or user-level of the servers being backed up.
While there are a great many backup software packages designed to be used for individual PCs with attached tape drives, comparatively few of these products address the needs of networks. In the networking context with which we are concerned, a software product should be able to back up a server volume, as well as protect its system files, such as the Windows NT Registry, the NetWare bindery, or the NDS database. Products like these are available from companies that are primarily devoted to developing backup software for the network environment, such as Cheyenne Software, Arcada, Legato, and Palindrome. All of these companies offer a wide array of solutions for different network types and operating systems.
Hardware, too, can be less expensive. While most of the same large capacity SCSI tape drives used on server-based systems can also be used with these lower-end software packages, there are also a large number of low-cost quarter-inch cartridge (QIC) tape drives on the market which are usually not recommended for server use, but which can be quite effective in this environment. Most of these, however, are obsolescing rapidly due to their limited capacity and the ever-increasing hard drive capacities being found on workstations and servers alike. When even an entry-level PC comes with a 500M hard drive, a 250M or even a 500M QIC tape drive does not appear to be a wise investment, especially for heavy use.
Another advantage of the workstation-based solution is the fact that, in a disaster recovery situation, a file server can be more easily restored from a remote system than from a drive that is attached to the file server itself. It would only be necessary to re-install enough of the network operating system to create the drive volumes and bind a network protocol. Everything else can be restored from the workstation. For this reason, many larger firms choose to assemble a redundant arrangement of backup hardware and software that can operate both on a workstation and a file server. Cheyenne Software, for example, markets software packages for both the workstation (ARCsolo) and the file server (ARCserve) that utilize the same interchangeable tape format. By purchasing external tape devices, the backup drive can swiftly be moved from file server to workstation in the event of a complete file server failure.
As you might expect, there are several major drawbacks to a workstation-based backup solution. The first and foremost is speed. Sending data from a file server to a workstation-based tape drive over a standard Ethernet or Token Ring IPX/SPX network connection will take far longer (roughly twice the time) than it would take to back it up to a tape drive installed on the file server itself, particularly when the hard drives and the tape drive are on the same SCSI bus. Another limitation is the fact that the workstation is limited to backing up only as many sources as can be mapped to its drives at any one time, which leads to the other major concern, which is security.
A workstation running software that schedules backups to occur during the night must be left logged in to all of the target servers with the appropriate rights to access all of their files. The memory resident application that is used to schedule the backups can also interfere with normal use of the workstation, so unless a computer is to be dedicated solely to backups and placed in a locked room, this sort of backup solution becomes less and less practical.
For a small business or workgroup consisting of a single server and a handful of workstations, however, this type of backup system is the most cost-efficient way of protecting the network against data loss. It is a mistake to let cost factors outweigh concerns for data integrity, but many small businesses allow themselves to be terrorized into the purchase of expensive, high-powered backup systems costing as much as their file servers, which are completely unnecessary for their current needs.
A good rule of thumb when making any purchasing decision in network computing is to base your purchase on what you need right now, and not on any future plans that extend more than two or three months ahead. Just about the only sure thing in the computer industry is that it continuously changes. No matter what you purchase today, there will be something available a few months from now that will be better, faster, and cheaper. There is nothing to be done about this except to try to keep current, and make the most intelligent decisions that you can right now.
The most common network backup configuration in use today is a NetWare NLM-based software package and one or more 4mm DAT tape drives attached to a file server's SCSI bus. When properly configured, this should allow approximately 15M to 20M of data to be backed up per minute onto tapes holding anywhere from 2G to 8G. This is usually sufficient protection for a medium-sized network, when the backup jobs are configured properly. Most of the major server-based backup software packages also allow multiple tape drives to run concurrently on the same SCSI bus, yielding cumulative backup rates of 100-150 M/min or more.
Depending on the number and current configuration of your file servers and your plans for future expansion, you may choose to add a backup system to an existing server or create a dedicated backup server. There are advantages and drawbacks to both alternatives. Adding a backup system to an existing server should only be done when that server has the resources available to support both the software and the hardware. Major NLM-based backup software packages may require up to 4M of memory to operate properly, in addition to what is already needed to support the operating system and the equipment already installed. Many packages, when actually processing a backup or restore job, spawn (or autoload) additional NLMs that are not resident when the software is loaded, but idle; so be sure to account for this when evaluating the additional load that such a system will add to your server. Another consideration may be processor utilization. Certain functions of backup software, especially database engines, can drastically increase the load on the server's CPU, at times. This can result in delayed access to users and even the temporary loss of the ability to communicate with the server.
In cases where it is felt that backup software places too much strain on an existing system, a file server dedicated to the performance of backups may be in order. A non-production server such as this, dedicated to network maintenance tasks (as opposed to servicing users), can also be the host for other network management products. As long as user access is restricted, then other network functions should not be disturbed by the backup process.
Another advantage to this concept is that, if it is absolutely necessary, backups can be scheduled to run during the workday. The problems caused by open files will still remain, as will a perceptible amount of network performance degradation, but in cases where there is no other choice, this method will minimize the impact of these problems.
The primary disadvantage to a dedicated backup server is the additional expense, not only for the hardware, but for the operating system as well. You must be sure that licensing considerations of the backup software allow you to back up all of your other servers. This may require you to purchase a network operating system license for a user level equivalent to that of your other servers, when there will actually be only a minimum number of users logged in to this server at any one time.
Another disadvantage may be that the speed of your backups will be negatively affected. A backup of a hard drive that is in the same machine as the backup hardware, or better yet, on the same SCSI bus, will always be faster than one that must travel over the network itself. If you have a high-speed network backbone, or if the length of the backup jobs is not critical, then this may not be an issue. In general, if you have a great deal of data to back up, or if you will be running multiple backup devices simultaneously, a dedicated backup server may be a solution that will end up causing fewer administration problems and being more economical in the long run.
The following sections examine the basic components of network backup software, and identify the areas in which the various products may differ. Some packages may be better suited to your network's needs for economic reasons while some may offer features that you require that others do not yet possess. It should be noted, however, that software of this type is continually developing as the rest of the industry develops. The basic goals of all network backup software packages are essentially identical--that is, to be capable of backing up and restoring all of the data types that may be found on today's heterogeneous networks as quickly and efficiently as possible. New developments in hardware and operating systems will inevitably be accommodated by backup software, and you may wish to choose a software vendor that most responsively and reliably updates its products to accommodate these innovations.
SBACKUP. Novell NetWare ships with a rudimentary file server utility called SBACKUP that will allow file servers and workstations to be backed up to a tape device. However, SBACKUP lacks nearly all of the advanced scheduling and convenience features found in any of the third-party products available. Consistency is everything in a reliable backup system, and SBACKUP, while it can effectively be used for a one-time job, leaves too many repetitive administration tasks to the operator to be a reliable everyday solution. Its use as a primary backup solution is therefore not recommended, but familiarity with its functions can be beneficial for two reasons.
First is the simple fact that every NetWare installation has it. In a situation when no other tools are available, such as that of a consultant visiting a remote site, the ability to perform a simple full backup of a server may be desirable, and SBACKUP can come in very handy at times. The other benefit of knowing SBACKUP is that it relies on the Novell Storage Management Services (SMS) system to perform its backups. Many third-party backup products use SMS for their own backups, to some degree, so familiarity with its modules and concepts can be useful in evaluating these packages.
SMS. Storage Management Services is an open specification developed by Novell for a standard set of Application Programming Interfaces (APIs) designed to provide a reliable interface between backup and storage management products and the various data types found in the modern heterogeneous network environment. When creating this specification, Novell clearly planned for its adoption by third-party developers. While SBACKUP utilizes the specification, it was no more intended to be a comprehensive backup solution than the Windows Terminal program was intended to be a full-featured communications package.
The specification consists of several basic components, each of which may or may not be used by a third-party backup package:
The interaction of the various modules is illustrated in figure 17.2
Figure 17.2
This is the Novell Storage Management Services Model.
The degree to which various backup software packages utilize SMS is highly variable. Novell has designated Level Two compliance to apply to any SME that fully utilizes the entire SMS specification. Some products, such as Palindrome's Network Archivist, offer such compliance. They are completely reliant on the SMS standards and are, in effect, guaranteeing the continued development of their product as long as Novell continues to support new data formats and operating systems with its TSAs. Palindrome has also written some of its own TSAs that are fully SMS compliant as well and can, therefore, be used with any other SMS-compliant software.
Level One compliance applies to software that can make use of the SMS TSAs to communicate with network targets. Many software developers use this as an optional feature for their products, or use the TSAs to enhance their range of services, employing proprietary communication methods for some file formats and using TSAs for others. It should be noted when evaluating products like these that any tape written using SMS can only be restored using SMS. Whether or not to choose a software package that is fully SMS compliant is a difficult question.
The main drawback to the specification is the greater amount of communications overhead involved than with most proprietary solutions, causing backup throughput speeds to be generally slower with SMS. I have not seen this difference in speed to be an overly dramatic one, however, and unless your installation requires the fastest possible backup speeds that you can achieve, this should not be a major factor influencing your purchasing decision. More attention should be paid to the present and future data types that you will be backing up, and whether or not the package that you choose can support them.
Some backup systems provide the option of not utilizing SMS at all, but care should be taken to note the instances when SMS is positively required to perform a proper backup job in today's networking environments. An interesting case in point is the need to back up the NDS databases that are the heart of the NetWare 4.x network operating system. The continued development of the NDS, since the original NetWare 4.0 release, has caused an ongoing problem for the developers of network backup software. At this time, there is no other backup agent available that can provide the full range of services to a NetWare 4.x server that Novell's own TSAs can, which includes complete backup of the NDS database as well as full support for NetWare 4.x's native compression features. This means that all files compressed by the NetWare 4.x operating system can be backed up and restored as compressed files, greatly reducing the amount of data traveling from the target to the backup hardware. However, even implementing the use of these TSAs has caused developers severe problems when modifying their existing products. While all of the major developers were in the process of readying native NDS implementations of their software, this process entailed major code revisions, in most cases, and a temporary solution was needed to accommodate the needs of their users.
The way in which various manufacturers worked to meet this need provides a good indication of their responsiveness to the market and the capabilities of their programmers to adapt their software to the changing needs of the industry. You may wish to ask how long it was before a particular manufacturer's product could be adapted to the backup of the NDS. You should also note the nature of the resulting product: was it a solution that was integrated into their existing software product, or an extra utility shipped to fill a temporary gap?
The average file server-based backup software package usually consists of a client application, a series of server modules, and a collection of agents. The "back end" or server portion will be packaged as NetWare NLMs or Windows NT services that perform three functions:
The client or "front end" portion consists of a manager program through which backup and restore operations are created, controlled, and scheduled, and a series of agents that allow networked resources on various platforms to be backed up. Examining these in greater depth will allow you to intelligently compare the feature sets available in the various packages and thereby judge which of several admittedly similar products best suits the needs of your network.
The Tape Server. At the heart of any network backup program lie the device interface modules, or the means by which the target data is sent to the tape drive itself. It would seem to be a fairly simple proposition to feed the data that is gathered from the various target drives over the SCSI bus to the tape drive, but the process is actually quite complex. Tape drives are sequential access devices; that is, data is written in a contiguous stream onto the tape and must be accessed in the same manner. This is unlike a random access device such as a hard drive, in which files may be broken into separate sectors depending on the nature of the free space available on the device.
A hard drive's platters spin continuously, with the heads making contact with the appropriate sectors on the platters only when a read or write is requested. This is why a hard drive's access time is measured in milliseconds, because the head simply has to proceed to the proper position, and move a tiny distance to make contact with the platter. With a tape drive, however, the tape is only in motion during the processing of a request, and must be traveling at the correct speed across the heads for data to be read or written correctly.
Every time that a tape drive stops moving the tape across the heads during an operation, there is a period of lag time while the drive spins up to the proper speed for reliable access. Earlier incarnations of magnetic tape storage applications would simply wait for the tape drive to achieve the proper speed before actually performing the read or write operation, slowing down their operational throughput considerably. With modern tape backup systems, however, data is fed to the tape drive at the proper rate of speed to keep the drive streaming, that is, moving the tape across the drive heads at a uniform rate of speed with no starts or stops. In this way, data will be transferred at the best possible speed with the least likelihood of data corruption. Tape servers do this by storing small amounts of data that have been accessed from the backup target devices in memory pools created on the file server from which they can be smoothly fed to the tape drive.
Another factor to consider is the wide range of tape devices that are supported by the major network backup software packages. The various QIC formats, 4mm, 8mm, and DLT drives, all function in radically different ways, and the tape server must accommodate any of them. For example, QIC drives pass tape across stationary heads at rates of 25 or 50 inches per second or more. Helical scan devices such as 8mm and 4mm DAT drives use heads rotating at 2,000 rpm while tape is moved across them at approximately one inch every three seconds, allowing nearly 2,000 separate tracks to be written across a single inch of tape! The tape server must recognize the capabilities of the tape drive from an identifying signal sent across the SCSI bus, factor in the capabilities of the SCSI host adapter itself, recognize the resource constraints placed upon it by the configuration of the network operating system and file server hardware, and then come up with a solution that will feed the data arriving from the targets at the highest possible speed. Add to this the fact that most of these products can perform these functions for up to seven storage devices running seven separate jobs simultaneously, and their performance can be seen as no less than phenomenal.
It is important to consider that virtually no other operation stresses the limits of a network's communications, storage, and I/O systems more than backups do. Normal network usage makes regular but intermittent calls to a server's disk drives. Applications and data files may be loaded and unloaded at regular intervals on dozens of workstations, but rarely do applications call for a continuous stream of high-speed data transfer the way that a backup job does. When, for whatever reason, the data stream is slowed or interrupted, and the tape server has no data to feed to the drive, then the drive spins down to an idle state, and must spin up again before the data stream can be resumed. This condition is called data starvation and is one of the most common causes of unusually slow backup speeds and data corruption. For a smooth, continuous stream of data to be delivered to the tape drive at the correct speed, the interaction between the tape server module, the SCSI drivers, and the other devices on the SCSI bus must be consistent and predictable.
In addition to the transfer of data, a complex series of format conversions also takes place. Data is stored on NetWare volumes in blocks of a size specified during the creation of the volume. Workstation operating systems each have their own file systems that store data in different ways. All of this data, once it has arrived at the tape server, is written to the tape in blocks (of a different size and configuration) specified through negotiation between the tape server software and the tape drive itself. Some or all of the data may also be stored in a proprietary tape format that is specified by the software manufacturer.
Another major factor in this consideration is the existence of other devices on the SCSI bus. While it is perfectly practical to have a backup device on the same bus as hard drives, CD-ROMs, and other devices, the compatibility and configuration of these devices is vital for smooth concurrent operation. To mix devices such as these, a standard SCSI protocol must be used, such as the Advanced SCSI Programming Interface (ASPI), which was developed by Adaptec and has since become the de facto standard in the integration of different manufacturers' SCSI devices on the same bus.
An ASPI driver is configured to directly address the host adapter and is loaded into memory. Then, all subsequent drivers for the various SCSI devices on the bus, including the tape server, address the ASPI layer instead of the adapter itself. The use of ASPI allows for virtually any modern SCSI device to be placed on a SCSI bus without interference from other devices, despite the overlapping requests that are generated by a network environment. While it is quite possible to attach a tape drive to its own dedicated SCSI adapter and eliminate possible interference with other devices and their drivers, this adds expense and driver overhead to the file server that may not be justified in some cases.
Thus, you can see that the operation of the tape server portion of any network backup software is far more complicated than it first appears. Software and hardware modules from three or more manufacturers must be made to interact without interfering in each other's processes. The most important factor in assembling a SCSI installation that will function to its fullest capacity is to gather components that are all designed to work together. This is why most vendors of network backup software conduct rigorous testing and certification procedures for both SCSI adapters and tape drives, most of them going so far as to certify not only specific devices, but specific firmware and driver revisions to be used with these devices.
Before making any backup hardware or software purchase, be sure that the adapter and the tape drive, their component firmwares, and any accompanying drivers have been certified for use with the software you are considering. Also, if you are going to run multiple devices on one SCSI bus, make sure that all of the hardware involved is ASPI compatible (as nearly all are these days). Avoid locking yourself into a proprietary hardware manufacturer or SCSI protocol, and be particularly skeptical of new trends in hardware development.
CAUTION: Remember, your backups are your safety net for any experimentation with new network products that you may care to conduct in the future. This is not the place to gamble on that slick new bus-mastering, error-correcting, self-caching SCSI wonderbus. Stick to the tried and true here, and you can be fearless anywhere else.
The Scheduler. While the tape server controls the actual transfer of data to and from the NOS and the tape drive, there must be another module responsible for seeing that the correct data is fed to the tape server at the correct time. This is the responsibility of the scheduler or backup manager. While a limited software solution like SBACKUP can initiate a single job at a specified time, all of the major third-party network backup packages can maintain and execute complete backup rotation schedules, allowing the administrator to create jobs that will launch at designated times and automatically reschedule themselves to repeat the next day, week, or month.
On a NetWare file server, this is usually accomplished using a job queue that is similar in nature to a network print queue. Jobs are created by a separate manager program and stored as encrypted files in coded directories located under the SYS:SYSTEM directory. The jobs are then executed at the appropriate time by a scheduling module that remains resident in file server memory. These same functions can be carried out by Windows NT services.
The most powerful of these products can maintain complete backup rotation schedules for up to seven different devices, running completely separate jobs at the same time. A virtually unlimited number of individual jobs can also be scheduled to run at any time, even months or years into the future. Many of these scheduling modules can also maintain a capability to perform scheduled copy jobs from one server volume to another. This would allow an administrator to schedule a regular "mirroring" job that would maintain an up-to-date replica of crucial files on another server at any interval desired.
When comparing the capabilities of the various software packages available, check on the different ways in which jobs can be stored and submitted to the queue in comparison with the layout of your network. While all should be able to submit jobs from a workstation-based manager program, others may also be able to submit them from a workstation's command line or from the file server console. This could be very useful if your servers are kept in a closet that doesn't contain a workstation. You should also be able to save a job configuration as a separate script file to be submitted at a later time. In this way, in a disaster recovery situation, complex backup rotations can easily be resubmitted to the queue.
The Database. All network backup products contain some means of tracking their activities. This is done to maintain a record of information such as when backup jobs were performed, what files were backed up onto which tape, and so on. Due to the sequential nature and slow seek times of magnetic tape devices, it is impractical to "browse" through a tape's contents in real time. It is therefore necessary to maintain a replica of each tape's contents in a database that will allow files to be chosen for restoration in the simplest and quickest possible manner. In addition to the file's existence on a tape, information as to its exact location will also be maintained. This will allow the tape server to utilize a high speed SCSI command to locate a particular file for restoration. An individual file can then be restored in seconds or minutes, rather than the several hours that may be required to read the entire tape, file by file, searching for the correct one.
Many products are also capable of maintaining database entries of other information concerning network backups, such as:
Care should be taken to examine what type of database is used by the various products being evaluated. Some vendors utilize commercial database engines such as Btrieve, while others have arrived at proprietary solutions. The use of a known database type has advantages in that there are likely to be third-party products available for database access and maintenance. Cheyenne Software's ARCserve, for example, ships with Crystal Reports, a reporting engine for Btrieve databases that provides extensive reporting and documentation capabilities as well as the ability to create customized reports. Their use of Btrieve also allows for the use of the maintenance utilities included with the Btrieve engine.
While Btrieve was originally developed by Novell and included as part of the NetWare package, it has since been sold by Novell and is now maintained by its own firm, Btrieve Technologies. This has had a positive effect on the overall product, and has resulted in the recent release of a solution to the primary drawback of the Btrieve client/server engine which is BREQUEST.EXE, the 75K DOS TSR requester that was required to be run at the workstation in order to access the server engine. A Windows DLL equivalent has recently been made available, which is quite effective and requires far less resources.
There are several other important factors to consider: What will the approximate size of database files be for the amount of data that you will be backing up? The database files will contain the name and location information for every file that is backed up during every job. These files can grow to be quite huge and steps may have to be taken to keep their size under control. A database engine should have the ability to be configured to purge database information when it has reached a certain age, shrinking the database files proportionately. Tools should also be available to repair databases that have become damaged or corrupted. Be aware that many database types will require a substantial amount of temporary drive space in order to perform such maintenance functions. Some products may allow these temporary files to be created on another volume, while others may not. Take these factors into account when planning an installation of these products.
In addition, there should be a means to restore files from a tape that does not exist in the databases. This may be done by addressing the tape directly, in order to perform a sequential search for the desired file, or to read the entire tape and assimilate its contents into the existing databases. A good database engine should have both of these capabilities, so that tapes made at an earlier time or at another installation can still be restored at will.
The use of a commercial database engine such as Btrieve provides extensive features and capabilities such as these to the application that utilizes it, but a trade-off must be expected in terms of both client and server resources as well as database size. If you are going to be backing up a relatively simple network and performing restores only in cases of the occasional mistakenly deleted file or disaster recovery situation, then such capabilities might fall into the realm of overkill. A product might therefore be called for that maintains a simple file and media index in which there is less possibility of corruption problems or system resource shortages.
Agents. Backup agents are software modules that run on the backup targets (that is, the devices to be backed up) that "package" the desired data and send it to the tape server where it is ultimately written to tape. Some backup software packages provide their own agents, while others utilize the TSAs provided by NetWare as part of the SMS specification. Some products may also use a combination of the two for coverage of various platforms or allow the user to choose between the two for a specific platform. Different agents are usually made available to address the various types of data to be backed up. These may include various workstation operating systems, server volumes supporting different file systems, and even special cases, such as live database files that must be backed up while in use. Obviously, you should ensure that the product you choose has agents available for all of the platforms that you wish to back up.
With today's heterogeneous networks, however, this may not be as simple as it sounds. Many of the packages have agents available for numerous flavors of UNIX workstations and OSs. With the growing popularity of 32-bit desktop operating systems such as OS/2, Windows NT, and Windows 95, you should carefully check whether your proposed backup software vendor has made agents available for all of the environments used on your network.
When evaluating agent coverage for your existing workstations, make sure that the product will function well with the way in which your users work. Remember, it is going to be the responsibility of the user to make sure that a workstation agent is loaded whenever a backup is scheduled. Whenever possible, it is a good idea to arrange your users' workstation configuration so that agent is loaded automatically. Most products will include both a DOS and a Windows agent, but the two are most likely to be exclusive. That is, the DOS agent will not function when Windows is loaded, and vice versa.
When conducting overnight backups, you must be conscious of the state in which your users leave their workstations at the end of the day. A DOS agent can be placed in the workstation's AUTOEXEC.BAT file or a Windows agent into the Windows Startup group, but neither will function if the workstation is turned off. Developing and enforcing a company policy in this respect will minimize the resource drain entailed by the loading of multiple agents and ensure that backups are performed reliably and on schedule.
Aside from coverage of all of the platforms on your network, there are also matters of price and performance to consider. Make sure of the agent's capabilities before you rely on them. For example, some Macintosh agent packages can back up and restore files to and from a Mac workstation, but cannot restore those files to a file server volume with MAC name space. Such limitations are obviously not well advertised by the manufacturers, but you should try to anticipate your backup and restore needs as completely as possible and determine if the products that you are considering can fulfill them.
Perhaps the most important consideration and the area in which you will find the most variance between vendors is in the price of agent coverage for your network. Most packages will ship with DOS, Windows, and OS/2 agents, but may not allow you to back up all of your workstations of those types with the base product license. Network backup products are usually priced either on a per server basis or in accordance with the NetWare user license installed. With Cheyenne Software's ARCserve, for example, you must purchase the same user level as that of the NetWare server that you will be installing it on, regardless of whether you want to back up workstations at all. However, this license will allow you to back up an unlimited number of servers (of the same user level or less) or workstations. Legato Software's Networker is priced on a per server basis. The base package will run on any user level of NetWare, but it will only back up that one server and up to 50 workstations. Backing up additional servers and workstations requires the purchase of additional licenses.
Both of these policies effectively overcharge a substantial portion of their user base. An installation with one 250-user server that needs only to back up the server drives should not have to pay several thousand dollars more for a backup package that supports a 250-user license, nor should an installation with many smaller servers have to pay an equal amount of money for what amounts to nothing more than a piece of paper.
Another place in which additional charges for these products may accrue is in the purchase of additional agents. Most of the major packages ship with a small subset of their available agents and sell the others as add-on packages. Be sure to check the prices of these add-ons before committing to a particular product and also whether the additional cost includes a license for a single or an unlimited number of workstations.
In general, it would be a mistake to choose a particular software package on the basis of licensing issues alone, but being aware of these additional charges in advance can avoid severe budgeting problems later. It should also be noted that, with the release of NetWare version 4.1, Novell has altered its own pricing stratification system, allowing additional user licenses to be added to an existing server license. This will force the third-party backup software developers (particularly those who charge on a user level basis) to rethink and hopefully restructure their pricing plans so that users of all levels are paying a fair price.
Reporting. Another important aspect of any backup software package is its ability to inform you of its activities. When backup jobs are going to run unattended, it is important to check on whether the jobs are running successfully each night. All of the software packages available offer logs by which all of the activities of the software and hardware can be tracked or monitored. These logs can be the most valuable diagnostic tools available for hardware and job configuration debugging. Several can optionally track all SCSI activity on the bus, aiding in the resolution of hardware compatibility problems. Many packages also offer varying levels of notification options that can be particularly useful for the offsite administrator or consultant. These options can range from a daily report that is automatically sent to a network print queue after every job to fax and e-mail notifications to pager and SNMP (Simple Network Management Protocol) support (see also chapter 34, "Network Management and SNMP"). This is another way to ensure that backup tapes are being changed regularly. Some products, like Cheyenne's ARCserve, can also generate a wide assortment of reports containing information in varying degrees of detail concerning particular jobs, targets, or tapes.
The Manager. The manager is a client program that provides the actual user interface to the backup system. With this program, backup and restore jobs can be created, scheduled, and maintained, and real time tape manipulations, such as formatting and erasing, performed. All of the major packages support a client manager running on a DOS and/or Windows workstation. Some also have a manager interface on the file server console itself. This can be very convenient if your file server closet lacks a nearby workstation, but this interface should also be as simple as possible, or better yet, optional. Since NetWare file server memory that is devoted to the creation of popup screens on the file server console is taken from the operating system's Alloc Short Term Memory pool, it often cannot be returned to the main file cache buffer pool without restarting the server.
A good workstation-based manager program should be able to control all of the software's functions. There should be an interface that allows for direct manipulation of the tape drive, a means for monitoring the current activity of the backup system as well as the jobs that are currently queued for later execution, and the ability to view, utilize, and maintain the databases. This is in addition to a logical interface for the creation of backup and restore jobs.
Evaluation of the manager software, as with everything else about a backup system, should be performed with an eye towards the requirements that your needs will place upon it. A company that simply backs up its servers each night and only performs an occasional restore will be using the manager software far less often than a firm that archives large amounts of data to a tape library for regular access.
This could be an important consideration, as some of the manager programs available require significant amounts of workstation resources to function. As mentioned earlier, large TSRs may be required for database connectivity and the diverse nature of the manager's communications with the file server may require reconfiguration of existing network access protocol drivers. Users of workstations running other operating systems than that which the manager was designed for (such as OS/2, Windows NT, and Windows 95) should also be sure that the manager is compatible with their environment.
In the following sections, we will examine some of the various backup scheduling and tape rotation schemes advocated by the various software vendors. Since the manager program is the means by which these schemes are enacted, we will also be examining some of the ways in which user interfaces are designed to provide access to these features.
When considering how to configure and schedule backup jobs to support a specific network installation, the amount of data to be protected should be compared to the capacity of the tape device being used and the amount of time available to actually perform the backup. The more data there is and the less time, then the greater the speed and capacity the tape drive should have.
The other major factor to consider is how often specific data types are to be backed up in order to provide the protection that the network needs. Most of the network software products have one or more preprogrammed backup rotation schedules that can be implemented with minimum user intervention, or you can create your own custom schedule. The alternative that you choose should depend on your perception and understanding of the concepts involved as well as the needs of your operation. By spending some time assessing the capabilities of your system and the ways in which they are implemented by the backup software, you should be able to decide on a backup schedule that will accommodate both your data protection and administrative needs.
Full or Partial Backups? In its simplest form, a backup rotation strategy consists of a full backup of all targets, every night of the week. Many shops follow this practice, and when implemented as a measure of additional security, it may be justified. But when it is done simply because it is the easiest way, then too much money has probably been spent on hardware and media. It is not necessary to have a tape drive that can store your entire network's data on one tape. As discussed earlier in this chapter, different data types make different demands on a backup system. Keeping dozens of copies of the same executable files serves no purpose other than to waste time and money.
Incremental and Differential Backups. Incremental and differential backups are the basic means by which data types may be distinguished. They also incorporate the simplest form of tape rotation into a backup solution. The idea behind both of these concepts is to make a full backup of a particular target on one day, and then back up only the files that have changed on each succeeding day. This may be through the use of the DOS archive bit, or by using the date-last-accessed attribute of the NetWare file system. The archive attribute is a single bit allocated by the DOS file system to keep track of a file's modified state. Whenever a file is altered, its archive bit is turned on during the file save process. If the bit is already on, then it is left on. When backup software is used to create a full backup of a target, it can be set to strip off the archive bits from all of the files as they are backed up. This leaves a drive with no bits set on any of its files. As specific files are altered throughout the next day, their archive bits are turned on. When an incremental or differential backup job is performed on that drive, then only the files that have an archive bit turned on are backed up. This will usually amount to a far smaller number of files than would comprise a full backup job.
The difference between an incremental and a differential job is whether the archive bits are turned off again or left intact during these secondary backup jobs. During an incremental job, the archive bits will again be stripped away, leaving no files with archive bits on the drive. The next day, the process is performed again in the same way.
Should the entire contents of that drive be lost, it will then be necessary to perform a restore operation of the original full backup tape, followed by additional restores of each successive incremental tape up to the day in which the data was lost. This is necessary because a file may have been written to tape during the full backup on Monday, then altered and therefore backed up on Tuesday, then left alone on Wednesday, and then altered again (and backed up again) on Thursday (see fig. 17.3). The most recent version of that file is therefore on the Thursday tape, and it won't be until restores are performed from all four days' tapes that the drive is returned to its original state.
Figure 17.3
In an incremental job, each day's altered files are saved to individual tapes.
A differential backup job is the same as an incremental job, except that the archive bits are not reset during the secondary jobs. Thus, after a drive is backed up in full on Monday, Tuesday's job will only back up the files that have been changed since the full backup. However, since the archive bits have not been reset, Wednesday's job will back up all of the files that have been altered on both Tuesday and Monday, and Thursday's job will back up all of the files that have been changed on Wednesday, Tuesday, and Monday (see fig. 17.4). It is not until the next full backup that the archive bits will be reset and the process will start over again.
Figure 17.4
In a differential job, archive bits are not reset and all of the altered files since
the last full backup are written to each tape.
Incremental backups, therefore, use the least amount of tape, because files are only backed up on the days that they have been modified. However, they are the most complex and lengthy to restore because each successive backup tape must be restored in the proper order to recreate a drive. Differentials utilize more tape because a file that is altered only once will be backed up in every successive differential job until the next full backup. When recreating a drive, however, it is only necessary to perform restores from the full backup tape and from the most recent differential tape, since all of the modified files for the week have been accumulated into that last backup job.
Since some file types (like Macintosh files) do not possess an archive bit, it may be preferable to utilize one of the NetWare file system's attributes, such as date-last-accessed or date-last-modified, if your backup software supports it. The date-last-modified attribute can be utilized in the same manner as the archive bit, but the date-last-accessed attribute may result in files being backed up that have not been altered, such as executables.
The use of these features does not mean that you must make a decision to use one or another of these methods for all of your data, however. You may decide that an incremental job is the best suited for your applications, while a differential is preferable for your data. There is no reason why you cannot split your network backups into multiple jobs addressing your different needs, as long as your backup software is capable of supporting it, your hardware is sufficient to the task, and your files are organized in such a way as to make the division convenient. This is why questions regarding the actual type of backup jobs needed are better considered before purchases have been made, software installed, and decisions made that may be difficult or impossible to change later.
Once you have determined which of these basic backup methods is most suitable to your network, the next task is to create a working process that accommodates both your administrative needs and those of your staff. Creating a tape rotation schedule is the process of setting up a system by which the greatest possible protection can be provided through the repeated use of the smallest number of tapes with the least amount of administrative overhead. In a system such as this, a predetermined number of tapes is used and reused according to a pattern of repetitive succession. This is to ensure that no tapes are being overused, and to make certain that an adequate "history" of your backup jobs is maintained at all times.
Remember that although it may be the most frequent, it is not always the case that you will need to restore a file that was backed up yesterday. Particularly when using incremental or differential backups, you may at some point have to retrieve a file from a backup made several days, a week, or several months ago. With the proper software, a proper tape rotation and a good administrative routine in place, you will be able to do so with no difficulty. If, however, when faced with this request, you must delve into that large box of unlabeled tapes in the back of the closet and try to determine which one contains the version of the file that you need, be prepared to work late that night (and probably the rest of the week).
At this point you must decide how many tapes are to be used, where they are to be stored, and who is going to be responsible for making sure that the tapes are changed regularly. The ideal network backup system is one in which the only regular maintenance necessary is to swap tapes in and out of the drive and to periodically clean it. When creating and scheduling backup jobs that are to run unattended, the one thing that you want to avoid is having a single job that requires more than one tape.
Some backup packages are able to track the patterns of your backup activities and make a reasonably accurate estimate as to whether or not you have enough tape available for the execution of a particular job; but prior planning is still the best way to ensure that your jobs execute properly and that an adequate backup history is maintained at all times. By examining some of the rotation schemes that are pre-configured into several of the backup software packages available, you should be able to judge whether or not they meet the needs of your network, and accumulate enough information to create a tape rotation scheme of your own, should you so desire.
Grandfather-Father-Son. The "Grandfather-Father-Son" tape rotation method is so named because three sets of media, usually corresponding to Daily, Weekly, and Monthly tapes, are used to represent the "generations" (see fig. 17.5). In this scheme, as with all tape rotation systems, a full backup of all targets is performed first. Subsequent jobs, which may be incremental or differential, are then run each day using the first media set. These are the "Son" tapes, so named because these tapes will be reused each week and will therefore remain the "youngest" in the rotation. After a full week's worth of backups, another full backup is performed. This is written to a tape from the second media set, that is, a "Weekly" or "Father" tape. The final weekly job of every month is then written to a "Monthly" or "Grandfather" tape from the third media set (see fig. 17.6).
Figure 17.5
In a Grandfather-Father-Son rotation, three "generations" of tapes are
used to ensure that an adequate backup history is maintained.
Figure 17.6
The scheduling of the daily, weekly, and monthly backup jobs in a typical Grandfather-Father-Son
tape rotation.
Usually, one or more of these media sets will be designated for off-site storage. The number of tapes included in each media set will, of course, depend on how much data is actually being backed up. But working with a system in which one backup job is performed each day and the entire job fits on one tape, then 4 daily tapes, 5 weeklies, and 12 monthlies will account for a year's worth of backups. Some of the software packages that provide this rotation scheme allow a great deal of configurational leeway, and others offer none. However, most of the systems that automate the setup process for this sort of rotation will automatically name the tapes for you and tell you which tape is to be inserted for each day's job.
This, in itself, may cause problems. Some of these products assign rather cryptic names to the tapes. Usually, the name will be some combination of the date, the media set, and the type of job being run. This, in itself, is not too much of a problem, but the tape names will often change every time that a tape is reused. This means that every tape that is removed from the drive must be relabled with the new name in order to accurately maintain the rotation. For whatever reason, this relabeling is a task that often manages not to get done. One of the more brilliant ideas that deserves to be more widely implemented in the backup industry is to build an interface into the backup program that will print tape labels on one of those tiny desktop label printers.
In light of this problem, an important consideration for any rotation scheme of this type--especially when you are able to alter the parameters being used--is how the system will behave if the proper tape is not inserted into the drive on schedule. Some systems will not run the job at all, and others may mistakenly overwrite an important tape that is left in the drive. This is why it is important to consider who it is who is going to be responsible for changing the tapes. In most situations, this is a task that is best assigned to one specific person, rather than allowing it to be done on an ad hoc basis, by whoever happens to be nearby. Depending on the physical location of the tape drive (which should be under lock and key, if at all possible) and the skill levels of the personnel involved, modifications in tape labeling, and rotation methods may have to be made.
I heard of one site at which a large black Netframe file server was, for some unknown reason, installed in the waiting area of the office. Since Netframes are attractively sleek, black boxes with no attached monitor or keyboard, this one had been garnered into use as an endtable! Although she had no idea why she was doing it, part of the receptionist's daily routine was to water the plant sitting on the file server and change the tape in the backup drive.
While this is certainly not a recommended procedure, it serves to illustrate the fact that the person responsible for changing the tapes may not be (and need not be) fully versed in the intricacies of the tape rotation scheme, as long as a proper system is put into effect and the tapes labeled correctly. In extreme cases like this, it may indeed be preferable to create your own simple rotation rather than using a predesignated one. When you do this, you can give the tapes whatever names you choose and schedule the jobs according to your own backup and storage needs. You can also predetermine the way in which the system will react when an unattended backup job begins with the wrong tape left in the drive. If the job will continue regardless of the tape name, then a single tape could conceivably be left in the drive for months at a time by a lazy operator. The alternative, however, is for a backup job not to run because someone forgot to change the tape. The decision, as always, depends on your needs.
Tower of Hanoi. The Tower of Hanoi is another tape rotation system that (like Grandfather-Father-Son) was adapted from mainframe use. Use of this system is far less prevalent, however, because although it arguably provides more comprehensive protection utilizing fewer tapes, it is far more complex and difficult to understand. While the products that offer the Tower of Hanoi, most notably Palindrome's Network Archivist, leave very little for the administrator to do in order to set up and use it, most people prefer to have a stronger grasp of the concepts involved before they rely on it for their backups.
The Tower of Hanoi is based on an ancient mathematical puzzle in which there are three vertical posts, one of which has a number of round donut-like disks threaded over it, stacked in descending size order, the largest disk on the bottom ranging to the smallest on the top (see fig. 17.7). The object of the puzzle is to move the entire stack of disks to another of the three posts while moving only one disk at a time and never placing a larger disk atop a smaller one. In order to solve the puzzle, the smallest disk has to be moved with every other turn, while each successively larger disk is moved a proportionately fewer amount of times. The disks correspond to the media sets of the tape rotation scheme (which may or may not consist of one tape) and the moves to the backup jobs themselves.
In Palindrome's rotation scheme, therefore, the first media set will be used for every other backup job, while the second set will only be used half as often, that is, for one out of every four jobs. The third set will be used for one out of every eight jobs, the fourth for one out of every 16, and so on. The number of data sets can vary, usually from five to eight. This "binary exponential" rotation scheme will therefore retain several recent copies of any one particular file, and fewer, but regularly spaced older copies. Every additional media set that is added to the rotation also effectively doubles the period for which a full "history" of a target is maintained.
Figure 17.7
This is the Tower of Hanoi puzzle.
The basic theory by which this system works is difficult to grasp. An even more complex task would be to figure out exactly which tape contains a particular file. Fortunately, there is no real need for the backup administrator to understand these concepts. Virtually all of the major network backup products include a mode by which these schemes can be implemented with no considerations other than a dedication to following the instructions given by the software. Almost every case in which something goes wrong during the use of these systems is due to user error incurred by improper execution of the required tasks (such as labeling the tapes) or by attempting to alter the configuration of the pre-programmed scheme.
Custom Tape Rotation. Usually, the best course of action with these preconfigured tape rotation systems is to follow them to the letter or to abandon them completely. A great deal of time and effort can be spent on working out the nuances of these systems in order to alter the configuration, after which there are usually very little results to show for the effort other than an increased sense of confusion and a rotation that is ultimately less reliable.
The custom rotation scheme that I usually set up for the simplest possible administration consists of several complete weeks worth of tapes. Each week consists of a full backup and four or more incrementals or differentials. The tapes are simply labeled "Monday-1," "Tuesday-1," and so on for the first week and "Monday-2," and so on for the second week, and so forth. Each week's worth of tapes is kept in a separate box, and at the end of each week, the just-completed tapes are removed and stored off-site. You can create as many weeks' worth of tapes as you feel it necessary to preserve a history for, and optionally save the full backup tapes from each week or month for an archive.
In this way, the person assigned the task of changing media has only to put the tape named for the appropriate day of the week into the drive. A rotation of this sort can be made increasingly more complex, incorporating multiple incremental jobs on a single tape if desired, or archives of seldom used files that are never overwritten. The single most important consideration, though, when setting up your own rotation, is the skill level of the person who is actually going to be interacting with the system on a daily basis. A cryptic, complex system may be perfectly adequate, if you are the only person using it, but do you really want to get calls from the office on your day off just because someone needs a file restored?
Once a backup system has been installed and configured, it should be rigorously tested before the backups are deemed reliable. Just because no error messages are generated and the backup jobs are logged as having completed successfully doesn't necessarily mean that everything is running perfectly. I have seen many situations in which administrators have planned a server upgrade, purchased a new backup software package, installed it, and performed what appeared to be a successful backup. They then would blow away their server volumes, only to find themselves unable to restore them. The only way to positively ascertain the validity of your backups is to perform test restores from the tapes you have made. This is also the perfect time to familiarize yourself with the functionality of the software's restore capabilities. The wrong time to have to learn a new interface is when someone is looking over your shoulder waiting for a file to be restored.
When performing test restores, you should also examine the restoration capabilities of your system. A good software package should be able to restore any file or combination of files to any compatible target, with a number of options. It is generally recommended, when restoring selected files, that they be written to a scratch directory and then copied to their final resting place, but if you choose to restore files directly to their ultimate home, there are some factors to consider:
The process of backing up local area networks has provided administrators with a number of unique problems not to be found when backing up stand-alone workstations. These problems have been addressed by software developers in different ways and with varying degrees of success. Some of these situations may be of vital importance to your installation, causing them to weigh heavily on your choice of software depending on the solutions arrived at by the manufacturer, while others may not apply to your network at all.
The first problem, one which applies to all networks, is the proper backup and restore of user accounts and trustee rights. In NetWare 3.1x, this information is found in the bindery. The bindery is composed of three hidden files that are located in the SYSTEM directory on the SYS: volume of a NetWare server. These files contain all of the information concerning user accounts and the properties assigned to those users, including trustee assignments and account restrictions. The protection of these files is not a terribly difficult problem to resolve, as the bindery is composed of files that, although hidden, are visible to the NetWare file system, and can be treated as such. During the restoration of an entire server, it is simply a matter of making sure that these files are restored to the SYS: volume first, so that trustee assignments and file owners can be properly registered when they are restored later. This is usually done automatically by the backup software.
With the introduction of NetWare 4.x, however, and the growing popularity of other network operating systems such as Windows NT, this problem has grown significantly. The NetWare Directory Services database, which is so integral a part of NetWare 4.x, is not visible to the NetWare file system. Also, the placement of the NDS as an enterprise-wide network resource, along with NetWare's capability to maintain replicas and distribute the database across various servers on the network, complicates the backup process and can make it necessary to restore specific parts of the NDS instead of always treating it as a whole. While virtually all of the major network backup products can successfully back up and restore the NDS database, several of them have not yet fully integrated themselves into the NetWare 4.x environment.
Most of the products are not yet fully NDS-compliant, requiring that a user log in to the network under bindery emulation to install and run the software. Most of the products also create users that allow the backup server access to remote targets, which are still being created as bindery accounts instead of fully qualified NDS objects. Various degrees of functionality are also available as far as the performance of partial restorations of the NDS database is concerned. Most of these perceived shortcomings are largely due to the continuing development of these NOSs themselves. Significant changes have been made to the NDS and to the tools used to maintain it in each successive revision of the NetWare 4.x operating system.
Software developers are inevitably forced to play "catch up" in situations such as this, when the environments that they are developing products for are changing as quickly as everything else does in this industry. The usual strategy for these developers is to provide a "Band-Aid" solution as quickly as possible and then integrate a fuller functionality into their next major revision. When exploring the capabilities of these backup products, I recommend that the software companies be contacted directly to ascertain the current condition of their software development cycle concerning these capabilities. When dealing with technical issues such as these, I also have found it advantageous to bypass the sales operations at these companies and talk instead to technical support personnel, who often have a better grasp of the software's current status.
Another extremely important problem to many administrators is the protection of database files while they are in use. Many organizations rely very heavily on their databases and wish to back them up while they are being accessed by users. This may be because they are in use 24 hours a day, or it may be that they want the added protection of several daily backups of these critical files. As a general rule, a file cannot be backed up when it is locked in an open state by user access. This does not apply to files like executables, which are accessed only for short periods while they are read into memory. If a file such as this is found to be in use during a backup operation, all software packages will repeatedly attempt to access the file, resulting in a successful backup in most cases. Many programs can also be configured to retry open files a specified number of times, and at specified intervals.
Database files, however, are usually far too large for any one machine to hold in memory. In addition there may be dozens, or even hundreds of users accessing and writing changes to these files at the same time. There is no way, under normal conditions, to back up a file that is perpetually opened to this kind of access. Many database systems, however, do have features integrated within them that allow a developer of backup software to create an agent that will, when it receives a request for access from the backup server, divert all of the changes destined for the database file to what is called a "delta" file. This is a temporary storage area for the modifications sent to the database manager, which is maintained while the agent closes the database file so that it can be backed up. When the file has been successfully copied, then the changes from the delta file are applied to the live database, and access continues as before.
Several backup software manufacturers, including Novell, have such database agents available, but they are always individually written for specific database systems, and are often specific revisions of such systems. These agents are usually sold as add-on products to the basic backup software package, and I strongly recommend that potential users of these products speak to the manufacturers of both the backup and database software packages. Attempt to ascertain whether or not the relationship between the companies is a strong one that will survive upgrades of both products in the future. Use of these products with live databases that contain mission-critical data, such as order entry and so on, should not be considered until extensive testing on off-line databases has been conducted. Remember, the entire object of this exercise is to protect your data. Be sure not to risk it unnecessarily on an untried product, especially when the result of a compatibility problem is not only a failed backup job, but may involve corruption of the original data as well.
In recent years, there has been a much greater tendency to run workstations with different operating systems on the same network. This may be done to accommodate a particular business application or the special needs of users but the quest for universal access to basic network services such as e-mail, printing, and Internet access has led to interoperability problems that were largely unheard of not long ago. For this reason, make sure that the backup products you purchase are capable of accessing all of the platforms that are in use on your network, as well as those that you are considering for the future. The move to 32-bit desktop operating systems that is currently being engineered by the industry has resulted in a lot of administrators doing testing on new OSs, such as OS/2, Windows NT, and Windows 95. It is a good idea to examine a software developer's plans for support of these platforms before a purchase is made.
This is particularly true for OSs that use proprietary file systems. When testing, make sure that a Mac file backed up from a Mac workstation or server volume can really be restored and used on a Mac. Very often, there are configuration settings to be adjusted for files like this, and failure to do so may cause the files to be incorrectly backed up as DOS files.
Another highly significant issue with network backup systems is security. By definition, a backup system must be able to access all files on the network in order to protect them. This opens a number of avenues for security gaps that must be closed by the administrator. Very often, the backup software provides tools that help to do this, but it is up to the administrator to anticipate their need to make proper use of them. At the most fundamental level, users can be warned of the potential for danger, and instructed to use application-level password protection on their most sensitive files. But, action must also be taken by the administrator, as the most well-meaning users can become lax in their protective measures over time.
Fortunately, most workstation agents do not require that the workstation be logged in to the network in order to back them up. They function instead using the IPX connection that is established as soon as the network drivers are loaded. There is, therefore, no direct security hole at the site of the workstation itself. However, if your nightly backup job can access the workstation's files, then someone else's backup job can also. This problem can be remedied by a feature included in most major software packages, that is, the ability to password-protect the workstation agent. The user of the workstation specifies a password when loading the agent that must be duplicated by the administrator when creating the backup job. Otherwise, access to the workstation will not be granted.
Restrictions can also be set at several levels regarding access to the backup software's manager interface. This can be done by installing the client portion of the software to the local drive of a workstation that is kept in a secure environment, or by granting only specific network users the rights to access the software and create backup or restore jobs. Some software packages can also distinguish between backup users with full access to the system and those who are allowed to create and modify their own jobs, but cannot affect the status or properties of other jobs that have already been defined. In this way, users can be given the power to perform their own restores without the danger of other aspects of the system being affected.
Finally, the ultimate security hole is the backup tapes themselves. They should be able to be password-protected as well, and should be kept under lock and key or off-site, in any case. It may even be desirable to schedule backups of more sensitive material, such as accounting or personnel files, as separate jobs in order to allow other users access to their own files without endangering the security of the rest of the company's data.
Other, more complex, scenarios can also be developed in cases, for example, when it is desired that a person be given the ability to create a full backup job without giving him access to full supervisory privileges. In most cases, backup software can be configured to utilize a specified account and password (other than the one currently being used) when scheduling access to a server for backup. By creating a network user account on the target server with time and station restrictions that limit its access only to the midnight hours from a node address equivalent to that of the backup server, this account name and password can be supplied to a user without giving that user full supervisor access.
In many office environments, ignorance can be the best security tool available. By keeping backup software and equipment out of sight, and by keeping the details of your backup routines close to the vest, smart users will be less tempted to experiment on their own and not-so-smart users will remain blissfully ignorant of the payroll data coursing through the wires all around them.
We have, thus far, examined the components of a basic local area network backup system, as well as some of the options available. For the average network installation, this type of system should be able to provide sufficient protection. However, today's network requirements are growing at a phenomenal rate. New data types such as high-resolution graphics, sound, and video require ever-increasing amounts of bandwidth and storage capacity, and backup systems are growing in order to provide the protection that they need.
Most backup software developers provide support for tape autochangers usually as an add-on module to their products. An autochanger is a device that consists of one or more tape drives and a robotic mechanism that can swap tapes in and out of the drive(s). Changers can range from small models, containing one drive and slots for four tapes, to huge refrigerator-sized devices holding four tape drives or more, and upwards of 100 tapes. Some high-end models also have bar-code readers to facilitate the location and labeling of tapes. There exists another sort of device, called a stacker, which is no longer in general use. A stacker is capable of removing a tape from its tape drive and inserting the next one in a series, thus allowing a job to span more than one tape without operator intervention. An autochanger, however, can address any tape in the magazine at any time, allowing multiple jobs to be run on the same device.
A changer software module accomplishes this by causing each of the tapes in the magazine to be loaded into a tape drive for identification. This inventory is then maintained in memory, allowing a job to be configured to address any tape in any order. Depending on the capacity of the changer, this inventory process can take from several minutes to several hours, making it rather inconvenient to change tapes in the device frequently. It is important to know that a changer is composed of two or more SCSI devices that interact. Some changers will actually require two separate SCSI IDs to be set, while others utilize logical unit numbers (LUN), which is a means of multiplexing SCSI signals using only one ID.
In many cases, the tape drives within the changer are no different from the stand-alone drives made by that manufacturer. This allows for easy swapping of drives should replacement be required, but can also result in some peculiar design aspects to the device. Despite their high prices, some autochangers are remarkably ill-designed devices. The best and most reliable models are the ones that have been designed as an integrated unit. Others seem to be composed of disparate components that are slapped together as the need arises. I have even seen changers in which the tape is made to be ejected from the drive not through a software command sent to the tape drive itself, but through a command sent to the changer that sets a small robotic finger into motion which actually presses the tape eject button on the front of the drive. This sort of Rube Goldberg contraption is to be avoided. Never purchase an autochanger without actually seeing it run, and look for those in which the design is logically thought out, allowing the smallest possible amount of movement to change tapes.
These devices can vastly increase the capacity and convenience of a backup system, but considerable thought should be given to support and service of the unit by the dealer or manufacturer, as the hardware can be notoriously cranky. In fact, it is good to consider the subject of product replacement before any purchases of tape hardware are made.
If you make your purchases from a reputable source, you should be able to arrange a guarantee that you will never be without a drive for more than a day. Some manufacturers are extremely responsive in this respect, sending out replacement devices by overnight mail at the first hint of trouble, and others are less so, perhaps making a purchase through a VAR preferable. You should also check on whether the devices you purchase have firmwares that are upgradeable through software. You should no longer have to send a drive out for a firmware upgrade.
As data storage needs increased over time, many administrators realized that many of the files populating their network hard drives were only accessed occasionally, if at all. As a result, it was found to be economically more practical to archive the lesser used data to tape or another less expensive medium, rather than purchase additional hard drives. This process, while economical from a hardware standpoint, causes a significant increase in administrative costs. Companies with large amounts of archived data began to have trouble tracking the exact location of files as they were requested, resulting in delays to users and increased demands on MIS staff.
To address this need, a new class of data storage products has been created that is similar to, but not a replacement for, backup systems. Using storage devices with far lower per megabyte costs than hard disk drives, such as tape autochangers and optical jukeboxes, storage management systems now exist which can be configured to automatically track the ways in which files are accessed and alter their storage location accordingly. For example, a typical three-tier arrangement might be that when a file is not accessed for three months, it is moved from a hard drive to an optical disk. In its place on the hard drive, a tiny "key" file is left, which functions as a pointer to the file's actual location. The key also allows users to see the file in a directory listing as though it had never been removed from the hard drive.
If the file is still not accessed for three more months, then it is migrated to a tape in an autochanger and the key file updated. The next time that the file is requested by a user, the system automatically loads the proper optical disk or tape into the drive, accesses the file, and delivers it to the user in the normal manner. The only indication to the end user that a file migration has occurred is the delay caused by accessing the slower, less-expensive media.
Systems such as these are not designed to replace backup systems, and in fact complicate the task of performing a backup, but they do utilize much of the same hardware, and most, since they are made by backup software developers, work very well in conjunction with their manufacturer's backup products. They are also an indication of the direction in which the development of new backup systems is headed. Most of these storage management systems are designed to handle truly vast amounts of data, and in order to do so, they allow for hardware to be distributed throughout the network, thus helping to avoid system resource depletion and LAN traffic bottlenecks at any one particular point on the network. This is also the direction in which backup systems are headed, and in the future, I think that we will begin to see more completely integrated data storage, backup, and management systems that will allow the full measure of usefulness to be extracted from the hardware on the network.
Because backup systems interact with virtually every component of a network, they can be notoriously complicated to troubleshoot. Breakdowns or bottlenecks can occur at any of several key points between the targets and the tape, causing problems ranging from reduced throughput to corrupted data to complete dysfunction. When trying to resolve situations such as these, the first step is to identify the actual location of the problem. In most cases, backup problems can be traced either to the tape drive, the SCSI subsystem, the backup server, or the network itself. The best practice is usually to work your way back from the tape to the source of the data.
The most common cause of malfunctions when writing to tape is media errors and SCSI hardware configuration problems. These may be manifested as error messages generated by the backup software indicating write errors or controller problems, or as reduced throughput during backup jobs that is caused by the need for the drive to rewrite blocks. Most tape drives have some error-correcting capabilities, usually a means by which blocks are read after they are written, compared to the source, and then rewritten if incorrect. These drives will often report statistics concerning the frequency of these corrections to the backup software. This information may show up in the software logs as Recovered Read or Write Errors. While a small number of such errors may be considered normal, a steadily increasing or very large number (for example, over 100) may be cause for alarm.
By asking a few simple questions, it can usually be determined whether media or hardware error is the cause of the problem, and which of the two is the most likely candidate.
Has the tape drive ever functioned properly, or have there been problems from the outset?
If a tape drive has been functioning normally for some time and suddenly begins exhibiting problems, then media error becomes the most likely cause of the difficulty, and those steps should be performed first. If the tape drive is newly installed, or has been functioning properly in another environment and is now malfunctioning, check for hardware errors first.
Are there other devices on the same SCSI bus that are functioning properly?
If the answer to this question is yes, then the problem is likely to be situated in the device itself or in the connection of the device to the SCSI bus.
Media Error Troubleshooting Checklist. The term "media error" does not always mean that the problem lies within the tape cartridge itself. Rather, it indicates that a problem has been encountered during the process of reading or writing to the tape. The following steps should be taken when a media error is suspected. They will eliminate any transient causes as the source of the problem; if the problem persists, then the cause very likely resides in the tape drive or some other hardware component.
Hardware Error Troubleshooting Checklist. This section lists the items that should be checked first when a problem in the backup system concerning the SCSI or tape hardware is suspected.
SCSI DISCONNECTION should be ENABLED.
SYNCHRONOUS NEGOTIATION should be DISABLED.
PARITY should be ENABLED. SCSI TRANSFER RATE should be set for 5M/sec.
If backup malfunctions continue to occur after the media, the tape drive, and the SCSI bus hardware have been eliminated as possible causes, then it is time to begin to look at the file server where the backup software has been installed. Problems occurring here may manifest themselves as reduced backup throughput, a slowdown or stoppage in logins, file transfers, and general network performance, or server lockups and abends.
Backup Server Troubleshooting Checklist.
As discussed earlier, it is important to be aware that performing a backup of a remote target (that is, a computer other than a server on which the backup software is installed) results in a far greater amount of traffic over the network medium than virtually any other process. It is quite possible for a workstation to function well under normal conditions, but to exhibit problems when being backed up, due to excessive traffic levels.
Problems of this type will usually manifest themselves as reduced throughput or failure to contact a target server or workstation for backup. It is usually fairly easy to isolate this as the cause of the difficulty, because operations internal to the backup server will proceed normally, and only those involving remote systems will cause problems.
Network Communications Troubleshooting Checklist.
As we have seen, backup systems are not "sexy," but they are necessary. The time spent in assembling and configuring a stable, reliable system can allow the LAN administrator the degree of freedom to experiment with new products in safety and without fear of damaging the network or interrupting users. This chapter has covered the software issues and theoretical background needed to develop a reliable network backup strategy. Clearly, an equally significant part of the system is the hardware involved. SCSI systems are covered at length in chapter 5, "The Server Platform." Both of these chapters should be studied before any firm decisions regarding software are made.
© Copyright, Macmillan Computer Publishing. All rights reserved.