In this chapter, you learn the fundamentals of replication, including how to install and use replication and how it might be a good fit for your projects. Replication is an intricate feature of SQL Server. The boundaries of it have not been defined yet, and there are innovative projects coming that will stretch these boundaries.
Replication is a broad term, and to understand how to install and use it, you first need to understand what it is and what it is not. To help with this, you'll first be introduced to some of the terms that Microsoft uses, how these terms implement the replication capabilities, and what they mean to your system.
Replication with SQL Server provides you with the ability to duplicate information between servers. This brings with it the ability to synchronize information sources for multiple domains, even in cases where they are physically separated by distance and possibly poor communications links. Replication, at its most fundamental level, recognizes changes made to information in a database and sends these changes to the remote system or systems for their use.
Microsoft has employed what is called loose integration in its replication model. You might have heard of the phrase real-enough time in systems implementation. Loose integration follows the real-enough time model. This means that information will flow across the system of servers not in real-time and not necessarily in batch mode, but as quickly as it can. The phrase real-enough time was first introduced when information was being distributed by e-mail to remote locations. It was certainly not real-time, as no live connection was maintained between sites.
Using e-mail didn't implement a batch approach, as transactions were often still addressed on a transaction-by-transaction basis. With the e-mail system, you're certain that your transactions are going to make it to the destination when a connection is completed between the remote system and yours.
SQL Server synchronizes the database tables, almost immediately in many cases, but not concurrently with the transaction that made the original change that is to be replicated.
Certainly there are pros and cons to this approach. On the positive side, the user of the system will not be waiting for a remote server to respond before the application can continue. This alone can be a big time-saver. Another benefit is that the system can manage some fault- tolerance in the queue of things to be done.
If you have a transaction that is going to a remote server and the remote system is down, the transaction can be queued for later processing. The server engine on the local side, the distribution engine, can retry the connection to, and update of, the remote server at regular intervals and make sure the transaction happens as soon as possible.
A positive side effect of this approach is to consider that, in reality, wide area networks might need to be connected over slow links. Transactions can be handled as quickly as the connection allows, without bogging down applications that might be using the database tables that are to be replicated.
On the negative side, it's possible to have information in two databases using this schema that is out-of-date or different on different sides of the replication equation. With true replication as implemented in SQL Server at this time, out-of-date schemas is a situation you cannot control, but might have to address if this presents a problem. Remember, there are limited situations where this might occur, such as if a scheduled connection is unavailable or a server goes down. In either case, as soon as the connection is available, the databases are re-synchronized at the next communication with each other.
NOTE: If you have multiple databases, tables, or other entities that must be absolutely in sync, you need to work with the two-phase commit capabilities of SQL Server. This entails custom development on your part using the C API and working with DB-Library. Be sure to review Chapters 22 "Communicating with SQL Server," and 23, "Understanding SQL Server and the Internet," for more information about developing applications for use with SQL Server.
There are three different key physical layer components to the SQL replication model: Subscriber, Publisher, and Distributor. Each of these components must be in place to have a functioning replication model.
There is a distinct implementation difference between distributing your information and replicating it. In many installations, you'll probably find that you have both scenarios. When you decide to implement a system based on multiple SQL Servers, you need to be very careful to define what information will reside on which SQL Servers. This sounds obvious, but you need to take it to the next step, which is beyond just determining functional requirements.
In many systems that are disbursed, the requirement to implement replication comes from the desire to have information available at a central location. This requirement, combined with the information to be centralized having a remote starting location, provides the foundation for a replication scenario.
This is certainly a system that can be automated using the replication capabilities of SQL Server. The primary requirement is that you put a SQL Server at each location and initiate the replication from the remote locations to the central location. Another alternative might be available if you reverse the flow of information shown in Figure 17.1.
FIG. 17.1
Basic replication model shows information flowing between differ-ent points in
your system.
Adding a distributed approach to this scenario might be helpful to enhance your application's performance. Distributed systems, or components of systems, lend themselves to situations that include some or all of the following attributes:
A good example of a distributed system is a point-of-sale (POS) system. In a POS system, you often have sales information at each individual location, but other locations do not have to have that sales information until it has been posted to a central location, if at all (see Figure 17.2).
FIG. 17.2
A typical WAN point-of-sale system lends itself to a mixed distribution- and replication-based
system.
In this configuration, you can see that information flow between the store locations is minimal, and, when it does occur, it's likely to come from the corporate headquarters. Systems like this lend themselves to both a distributed information approach and a replicated scenario. First, you need to understand that replicated information is read-only at the recipient location. You can use the fact that things are read-only to your advantage.
TIP: SQL Server's replicated tables are read-only on the subscribing system. If needed, you have to either create triggers on the incoming table that insert the replicated information into a working table, or you need to modify your system to use the tables differently, allowing for the read-only nature of the replicated tables.
In the POS example, you can use replication to send all inventory information, including pricing, store inventory levels, and other stores' sales information, to the different remote locations. Because this information is reasonably static, you can use a scheduled exchange of this information rather than a continuous update. You might want to keep this in mind as the process of setting up replication is discussed in the upcoming sections.
Distributed information can be maintained at the stores. This information includes sales information, customer update information about purchases, credit payments, and so on. You can refer to this as distributed because the stores operate on this information separately from the corporate headquarters. They can create, update, delete, and otherwise manage this information on their own.
TIP: Using SQL Server to manage distributed information at the remote locations is not a requirement. You can consider using less expensive options if needed. Other databases that use ODBC are key candi-dates for the remote locations, as they are able to connect most easily to SQL Server later to update the main database systems.
In short, you should use a distributed environment to provide a fault-tolerant, independent system. The information produced under these routines, and the routines designed to create the distributed information, might or might not be required at the central location. Replicated systems are ideal for information that doesn't change very frequently at the recipients' location. Keep this in mind as you review the different replication and installation options throughout this chapter.
NOTE: The processes that make up the replication model are all managed by the SQL Executive service. If you experience problems with replication, one of the things you might want to check out is whether the USERID and PASSWORD used by the SQL Executive are still available and have not been changed. You specified these values during the installation process.
SQL Server's replication features follow a magazine-type analogy to relate the different roles that are parts of the equation. The model follows an I'll Publish--You Subscribe approach that's easy to follow and implement. In the next sections, setting up the different options for replication on both sides of the equation is covered.
NOTE: SQL Server 6.5 has added the ability to replicate to ODBC subscribers, such as Access and Oracle. As stated in the SQL Server documentation, the ODBC subscriber must meet the following criteria:
- Must be ODBC Level 1-compliant.
- Must be 32-bit, thread safe, and for the processor architecture--for example, Intel, PPC, MIPS, Alpha--that the distribution process runs on.
- Must be transaction-capable.
- Must support the Data Definition Language (DDL).
- Cannot be read-only.
One last consideration that you might want to be aware of concerns the replication between a SQL Server version 6.5 environment and one running version 6.0 of SQL Server. If you'll be working in a mixed environment such as this, you might want to make sure that the Publisher and Distributor components are running on the 6.5 Server.
In addition, you have to treat the 6.0 Server as if it were a standard ODBC data source. This means you need to set up an ODBC configuration in the Control Panel. The positive side is that you'll be working around the limitations of the 6.0 replication, and you can use all of the features outlined in this chapter against the downlevel server. This is the same approach you use if you want to replicate to any other non-SQL Server database.
The final step in getting the downlevel server ready is to install the sqlole65.sql script. This script makes the replication possible, and it makes it possible to use Enterprise Manager from 6.5 against the 6.0 Server. This script is found in the \<platform> subdirectory on your SQL Server 6.5 CD.
TIP: If you need to install the ODBC driver pack, it's included on your SQL Server CD in the \<platform>\odbc directory. In addition, for the Intel platform, there are additional driver sets in the \i386\odbcrepl directory.
Before you go much further, you need to test your connection to remote servers to make sure you can connect to them appropriately. Microsoft provides a utility, located in your BINN sub-directory on the server, that helps you test the ODBC connection between your server and the remote system.
TIP: Even if you're not using an ODBC connection directly to work with the other server, as would be the case with a remote SQL Server 6.5 system, it's still a good idea to check your connection with this fast test. It removes a variable should you encounter any problems later.
As you can test a TCP/IP connection with the PING utility, you can test an ODBC connection with the ODBCPING utility. ODBCPING has the following syntax:
odbcping [-s servername] [-d dsn] [-u username] [-p password]
Though the utility is a command-line, DOS window utility, it provides a solid test of the connection between your servers.
When you run the utility, you can check a direct login by specifying the server name, or test a DSN that you've set up by indicating the DSN. Either way, provide the username and password for the connection.
When the utility runs, you should receive a message similar to the following:
CONNECTED TO SQL SERVER ODBC SQL Server Driver Version: 02.65.0213 SQL Server Version: Microsoft SQL Server 6.50 - 6.50.201 (Intel X86) Apr 3 1996 02:55:53 Copyright 1988-1996 Microsoft Corporation
If you cannot connect successfully, you receive an error message that resembles the following:
COULD NOT CONNECT TO SQL SERVER SQLState: 08001 Native Error: 6 Info. Message: [Microsoft][ODBC SQL Server Driver][dbnmpntw]Specified SQL server not found. SQLState: 01000 Native Error: 53 Info. Message: [Microsoft][ODBC SQL Server Driver][dbnmpntw]ConnectionOpen (CreateFile()).
If you cannot connect successfully, check the following:
If all else fails, consider re-creating the connection in ODBC and try connecting to the server using standard network drive mapping conventions. Chances are good that the server is unavailable or the SQL Server process on the remote server has not yet been started.
When you decide you're going to be using replication to provide information to other systems, you, in effect, become a publisher. You publish your information for other systems to receive. At the highest level, the information you provide to other systems is called a publication. Publications consist of articles, which are the items that are provided to the other systems. Articles are discrete pieces of information that range from the entire contents of a database to a single row or the result of a query. As you'll see later in the chapter, there are several different ways to dissect the information you'll provide via replication, as follows:
Each of these has its advantages and disadvantages. Which you use depends entirely on what types of information you're replicating and how that information will be used by the remote system.
NOTE: Any table you want to replicate must have a unique index defined for it. If you don't have one defined, it will not show up in the list of available tables when you define the publication.
If, after you've installed and configured the necessary components of replication, you have problems, consider the following troubleshooting tips:
Entire Databases When you provide entire database articles to the remote system, you are asking SQL Server to monitor all activity for that database and update the users of the information with any changes that occur. This is probably the simplest form of replication to administer, as you're telling SQL Server to send everything. Keep in mind the basic covenant of replicated tables: The information is read-only at the recipient. It might be that this would be too restrictive on your system to implement this type of full-blown replication.
NOTE: If you're creating a data warehousing application--one in which you simply provide snap-shot type information to other locations--this just might be the ticket. In these types of applications, you want to provide a picture of the information at a given time. You can create a database of queriable information, provide it to these remote users, and need not worry about the information being changed. You also can be assured of the most recent and updated information at these locations, as SQL Server will monitor it for you and initiate updates on the schedule you designate.
The replication of tables and specific partitions of tables is often the way to go when you have a modifiable, production environment that you're exporting information to.
Entire Tables When you replicate entire tables, you specify a table that is monitored and sent out by the replication engine whenever changes are made. This table is kept up-to-date based on the export timing and criteria you establish in setting up the publication article.
NOTE: You can establish more than one article for a given table. This might be helpful if you want to provide all information to a production, administrative server, but only limited information to a reports-only server that is used by non-administrative personnel. Be aware, though, that each publication you establish requires resources on the server to process the replication event. If possible, consider publishing the table once. After this, you can create a view or other protected viewing mechanism on the receiving server as it works with the client-side software.
Horizontal and Vertical Partitions of Information, Custom Views When you publish horizontal partitions, you're using a SELECT statement that provides all columns of information, but is selective on the rows of information that will be considered. For example, consider the following simple SELECT statement:
Select * from Authors where au_lname like "W%"
This selects only those authors whose last name starts with a W. These are the authors that are included in published information.
On the other hand, you might want to only publish certain columns of information to the remote system. For example, the Authors table in the Pubs database installed by default with SQL Server includes a Contract field indicating whether the author is under contract. If you're replicating a list of authors to a remote location, the users might not need to know whether you've established a contract with the author. In that case, you can select all fields except the Contract field to be included in the replication article.
You also can export a combination of a vertical and horizontal partition, or you can replicate the results of a view. The selective replication capabilities of SQL Server are the power behind the tool that really begins to broaden the appeal of replication.
After you've set up what information you'll be publishing with the articles and other information covered earlier, you need to move to the subscription system and let it know where to find the information that it will receive. The recipient is called the subscriber in the replication scenario. The subscriber sets up a connection to the distribution server and receives the information at the intervals you established when you created the article to be replicated.
Subscriptions include the capability to designate where the information will go, which enables you to control who has access to it by using the standard NT and SQL security capabilities.
There are two silent partners in the replication process that are always running and performing their tasks, but are much less visible when compared to the subscriber and publisher roles. These are the log reader process and the distribution database.
Replication in SQL Server is transaction log-based. As changes are made to articles that are declared for replication, they show up in the transaction log. The log reader process, an automatic background task on the server, detects these changes and logs them to the distribution server's distribution database.
After a transaction has been processed for replication and put in the distribution database, the replication engine publishes it and makes it available to the other servers that need it. You can see the log reader process if you look at the running processes in SQL Server. You can see the process in the list of active connections.
NOTE: If there are any synchronization jobs pending, you also can see those jobs listed in the pending tasks area of the running tasks list. n
The distribution server and database serve as the go-between for the publication server and the subscription server. When you set up the system, as you'll see when you go through the installation process, you have the option to set up a local distribution server or a remote system.
While the distribution database is separate from the other databases in your SQL Server system, it can still be on the same physical server. When you set up a local distribution database on the publication server, SQL Server uses this new database as the mechanism to keep track of the different items that need to be provided to the subscription servers on the network.
The distribution database is also the storage location for the different stored procedures that make up the replication engine. These stored procedures are automatically called when you use replication and are used to automate the different processes that happen under the covers when replication is running.
If you choose to use a remote system as your distribution database, the database is created or referenced on a different server from the publication server. This can be beneficial on a system where you might have high transaction volumes or might otherwise have a server that is bogged down just in the normal processing of transactions or network requests. In these situations, it can be beneficial to have this other system--the distribution database hosting system--manage the replication of information to the subscription servers.
CAUTION: You must have 32M of RAM on the server that will be the distribution system if you combine the distribution and publication operations on a single server. In addition, you might need to install additional memory if a given distribution server is to be used by more than one publication server. It might also be beneficial to implement a multi-processor server in cases where workload on the server is substantial.
You generally won't work directly with the log reader processes or the distribution database, but it's important to understand their functions to correctly set up your system.
When you establish replication, if you take time now to figure out exactly the best way to lay out your system, you save yourself time later. As with database definitions and ERDs discussed in Chapter 4, "Data Modeling, Database Design, and the Client/Server Model," planning early in the development process really pays off when it comes time to move your system into production.
The configuration options for replication, at least on a physical implementation level, boil down to three different approaches, although there are certainly variations on these that can work well for some situations. The following are the basic three topologies:
Combined Single Publisher/Distribution, Single Subscribers The most basic type of installation is the combined publisher/distribution server and a single subscriber system. This enables you to configure a server where the transactions for your system take place and the information is replicated to the subscribing server. Figure 17.3 shows a simple replication configuration.
FIG. 17.3
This is a simple system using a single server as the publication and distribution
system and a single subscriber.
NOTE: The figures and examples indicate a one-way flow of information. You also can install replication's publication features on a subscribing server and effectively replicate other information back to the original publication server. To put it another way, because a system can be both a publication and a subscription server, information can flow in both directions. In this case, you just need to establish appropriate articles for publication.
It's likely that many systems that are not extremely high-volume fall into this category, as it's the most straightforward to set up and maintain. Be sure, however, to consider moving the distribution processes if your transaction volume begins to cause the server to become bogged down in processing both application and replication requests.
Single Publisher, Distribution System, Subscribers By splitting the publication and distribution model, you can begin to address specific performance bottlenecks that can be apparent in your system. If you implement a model of this type, you'll be able to add processors, memory, or other resources to systems as they become overused. As you can see in Figure 17.4, it can be helpful to distribute the configuration of your servers.
FIG. 17.4
Moving the distribution processes enables you to optimize system configurations
for op-timum performance.
This system topology is also the first step toward a data warehousing system. By copying the information to the intermediary server, if you implement multiple subscription servers using this information, you can ensure that accurate, timely information is distributed without an overwhelming impact on the initial publication system.
In systems where you are publishing to multiple subscribers--especially in cases where the publications might be going out over a slow or remote link--a separate distribution server should be carefully considered. In each of these situations, more server attention, which translates into CPU cycles, is required to complete the replication and is best handled by a separate system.
NOTE: If you do implement this type of scenario, each system--the publication server, the distri-bution server, and each of the subscribers--is required to obtain separate SQL Server licenses. This cost needs to be part of the analysis when you are considering using a distribution server in this manner. n
Single Publisher, Subscriber, Publisher, Subscribers A big topic in the database world right now is data warehousing. As mentioned in the last section, replication lends itself well to data warehousing implementations, as it provides you with an automated way to provide read-only access to users of your system. You know the information will be correct and as up-to-date as you need, and because no intervention will be required, SQL Server's replication engine can be set up to update the remote systems at intervals required by your application. In this scenario, you are publishing the information to an intermediary server that is responsible for secondary replication to subscribers. This is an excellent way to leverage your servers for their distribution to the users of your information. As you can see in Figure 17.5, one approach is to link servers to further distribute information.
FIG. 17.5
Re-publishing infor-mation leverages your resources and can be helpful in mass dis-tribution
or slow-link to multiple subscriber installations.
This type of installation is also a good candidate for a separate distribution server if there are other operations to be required of the initial subscription and re-publication server. It's also important to keep in mind that memory requirements can increase on the re-publication server to work with the increased number of subscribers.
The first step to installing replication is registering the remote server in the Enterprise Manager. There is nothing unique that you need to do to accomplish this, but you must have the server predefined prior to beginning the installation of the replication and subscription services.
NOTE: When you define servers in the Enterprise Manager, it's important that you select the correct type of security to be installed at each server. Selecting the Trusted Connection indicates that Integrated Security is in use. If you select the standard logon connection, you need to be sure to specify someone who is a valid user on the remote system and who has sufficient rights to work with the database tables you are replicating. If you do not, you'll receive errors when the replication service tries to connect to the remote server.
When you start the replication installation process, one of the first things SQL Server does is verifies that you've declared enough memory for SQL Server processes. The minimum required memory allocated to SQL Server is 8,196K, or almost 8M. You set this option in the Server Configuration dialog box, selected from the Server menu's Configurations option. An example of this is seen in Figure 17.6.
TIP: It's generally recommended that you allocate as much RAM as possible to SQL Server. A good guide-line is that you should allocate all RAM except approximately 24M to the SQL Server process. For example, in a 64M system, you should allocate approximately 40M to SQL Server. The 24M handles the network operating system and other base operating requirements of the system.
FIG. 17.6
You must have a minimum of nearly 8M allocated to the SQL Server replication features
of your system.
NOTE: After you establish the memory configuration, you need to restart SQL Server. You can do this from the Enterprise Manager by selecting the server and then right-clicking it. Select Stop from the menu. After the server has been stopped, as indicated by the red traffic light, restart it. This institutes the changes to the memory configuration you just made.
The next step starts the process of installing replication as a publishing server on your system. Prior to setting up the distribution database, you need to have established both database and log devices. These must have sufficient disk space allocated to support the database that you'll be replicating. These are the devices that you'll place the distribution database and logs onto. The recommended minimum size for these items is 30M, but your size depends significantly on several different factors, as follows:
Select Re_plication Configuration from the Server menu, and then select Install Publishing from the submenu. This begins the process of setting up the distribution database and other options that govern the server's operation for replication. Figure 17.7 shows the dialog box that you use to set up replication options.
FIG. 17.7
Set the options for the distribution database and processes for replication, including
device sizes and naming.
This dialog box enables you to define whether the distribution process will be local or located on a remote system. If you select local, you need to indicate where the distribution database will be installed. If you indicate a remote server, you need to know the server name and the SQL Server process must be already running on that system.
When you click OK, the new database is added to the system, and you are ready to start setting up specific options to enable publishing on your system. If all goes well, you receive a prompt, as shown in Figure 17.8, that enables you to go directly to set up publications on your system.
FIG. 17.8
After the distribution database is created, you can go directly to the definition
of your publications and subscriptions.
NOTE: If the installation of the distribution database fails for any reason, you need to address the problem before you can continue. A possible reason for experiencing a problem here is if you've registered a server with a user that is nonexistent or has insufficient rights on the remote system. If this is the problem you experience, make sure you create an appropriate user prior to continuing.After you address any system problems that are presented, you might be faced with needing to completely uninstall replication before you can continue. For more information, see the "Uninstalling Replication" section at the end of this chapter.
The first step to creating your own publications is to, in essence, turn on publication services, designate a frequency for the services, and indicate which databases are candidates for replication. You can get to the configuration options, shown in Figure 17.9, from the Yes/No dialog box after defining the distribution database, or you can select the Publishing option from the Server, Re_plication Configuration menu.
FIG. 17.9
With the configuration dialog box, you indicate frequency of updates and which databases
are available for replication.
For each server you have registered to the Enterprise Manager, you can indicate whether it is allowed to subscribe to this server by selecting or deselecting the Enable option from the Enable Publishing grid. For each server you allow to subscribe, you can indicate at what frequency you'll be replicating information to the server. The default is to replicate information continuously starting at 12:00 AM and continuing throughout the day.
If you are using a remote link or are only providing snapshot information to the subscription servers, you might want to change this value to a time when the system is generally under less load. This can help decrease the impact on network traffic that will be incurred when the replication process kicks off. Of course, this places your data into a batch-type mode and might not be acceptable at the end user's application.
Selecting or deselecting databases controls which databases you will have to select from when you define publications. If you have a secure database that you want to control, this is the ultimate protection. By not making the database available to the replication process, you can be assured that it will not be included in publications.
You might notice the working directory option shown in the dialog box just below the designation of the distribution database. This directory is where information is kept and staged as it is sent out to the subscription servers. You can change this directory if needed. When SQL replicates a database, it is using a method somewhat like a BCP to copy the data from one system to another. This directory is where the data files are built.
NOTE: It might be desirable to indicate a secured location for this directory if you have concerns about other users modifying or reviewing this information when it's staged. The information is not readily discernible, but should be protected nonetheless.
If you want to create a custom distribution schedule or control how often information is sent based on transaction volume, you can click the Distribution O_ptions button. As shown in Figure 17.10, this option also enables you to indicate how long information is maintained at the distribution server after it has been sent to the remote system.
FIG. 17.10
You can create custom distribution schedules on a server-by-server basis. This can
be helpful to take into account time zone changes and other loading factors.
If you do opt to establish a custom distribution schedule, a separate dialog box enables you to change a number of options. As shown in Figure 17.11, you can set up on a macro level how often the replication takes place. This option allows values that range from daily to monthly. As you select this highest level interval, the balance of the options change to reflect values that enable you further to define how you would like to have the replication carried out.
FIG. 17.11
Custom timing for replication can be defined for a number of time intervals.
At this point, you're ready to set up the specific publications and articles that you'll make available to other systems. You've defined the servers that have access to the information and the frequency at which it will be provided, and you've indicated where the information will come from in your system. The next step is to create the publications that will be replicated.
TIP: If you find that you're having trouble getting information to the subscribing systems, make sure the replication users, repl_publisher and repl_subscriber, are still defined on the systems that are taking part in the replication.
You might recall that you have two different options when it comes to replicating information on your system. You can either publish the entire database, or you can select specific information in the database to replicate. In this section, setting up replication for an entire database is covered. In the coming sections, setting up targeted articles is discussed. To begin, go into the Replication management options (see Figure 17.12).
NOTE: Remembering which option to use from the menus or the topology dialog box can be confusing. It helps to keep in mind that you don't generally use the options on the Server menu after you've installed and set up the basic configuration for replication on your system. All aspects that deal with the specific publications and subscriptions are controlled from the Manage menu.
FIG. 17.12
Use the Manage menu to control all aspects of the replication
process after you complete the initial setup.
Select Manage, Replication, Publications from the menus to begin working with the different publications that are on your system (see Figure 17.13). This takes you to the dialog boxes that enable you to define and configure the publications and articles that you will be offering.
FIG. 17.13
The initial Manage Publications dialog box shows the databases you've selected to
allow for replication.
The dialog box shows each database for which you've enabled replication during the setup phase. Of course, there are no publications listed initially because you've not defined them yet. You use the New, Change, and Remove buttons to manage the different replications in your system. The tree display expands under each database, showing the different publications that are available after they are defined. Under each publication, you can see the articles that have been created. From this display, you can manage all active publications.
Selecting the Tables to Be Published In the first example, an AllPubs publication is set up to replicate all eligible tables in the PUBS database. Remember, you must have a primary key defined for each table that you need to replicate. You can use the ALTER TABLE command to institute a primary key if needed. When you select New, you're presented with the dialog box that enables you to set up the publication and indicate the information that should be replicated. Figure 17.14 shows the dialog box that lists the different tables that are available and enables you to set up the specifics of the article.
First, you need to name the article that you're defining. When you do, make sure you do not include spaces or wild-card characters, such as an asterisk, in the name. If you do, when you click Add to save the publication, you are prompted to change the name.
NOTE: You might want to create a naming convention for your publications to make it easier to know what information they provide. This is completely optional, but if you're publishing numerous articles, this information tells the subscription systems to determine which publications they need to be working with.
As this article publishes all eligible tables from the PUBS database, simply selecting the <all tables> option will suffice for the definition. Each table is copied to the Articles in Publication pane, and you can tell that they're included by the shared icon that you're familiar with in Explorer and File Manager.
FIG. 17.14
Setting up the article to publish an entire database only entails selecting <all
tables> for the article definition.
Establishing Table Synchronization When you establish a publication, you are guaranteed that the subscribing system's tables will be initialized to contain the same information as the replicated table. To do this, there is an initial synchronization step that is completed. You can control how and when this step occurs by selecting the Synchronization tab from the Edit Publications dialog box shown in Figure 17.15.
FIG. 17.15
You can defer synchronization to a time that suits your systemrequire-ments better,
possibly saving money if remote connections are used.
If you are replicating information between SQL Servers, you want to make sure you leave the default copy method, using Native Format data files, selected. This enables SQL Server to save the information in as optimized a format as possible when it creates the export file used to initialize the subscriber's database tables. The other option, Character format, saves the exported information in a format that you can use if you are importing to the receiving database using other utilities, or if you're using the information produced by the initial synchronization in a third-party system.
TIP: If the transaction log fills up and you're forced to dump and truncate it, you should first unsubscribe any replication servers. After you have done this, dump and truncate the log, then re-subscribe the servers. When you do, they automatically re-synchronize with the publishing server.
You can also set up the schedule of times that controls when the synchronization occurs by clicking the Chan_ge button. Establishing these times is very similar to the time frames you set up during the initial configuration of the replication services.
Controlling the Recipients of Subscriptions A key feature and concern regarding replication of databases is security. You've seen when you set up your system for replication that you can determine what subscribing systems will have overall access to your replication services. This presents a possible challenge, though, when you need to provide access to some, but not all, publications that reside on your replication server. SQL Server enables you to indicate, on a case-by-case basis, what servers have access to any given publication. By default, when you set up a new subscription, SQL Server makes it available to any server with overall access to the replication system of your server. To change this, as indicated in Figure 17.16, select the Security tab of the Edit Publications dialog box.
FIG. 17.16
You can easily prevent access to specific publications depending on what servers
are requesting the information.
When servers look to your system to select available publications, if their system has not be authorized to use a given publication, it does not show up in the list of available items. You can change this option later to allow or disallow access to a publication by selecting Security from the initial Publications dialog box.
Making Changes to Publications After you've set up publications, you can review them from the Manage Publications window. In Figure 17.17, you can see that each database shows its related publications and articles as they've been defined on the server.
FIG. 17.17
You can use the tree view of the replication system to select a publication or article
to review or modify.
If you want to make a change to a publication, simply highlight it and select Chan_ge. You are presented with the same dialog box that enabled you to define the publication in the first place. You can modify the tables that are included, how the information is synchronized, and the security aspects of the publication.
Now that you understand how to publish entire databases, this section reviews how to selectively publish information from your replication server. As you might imagine, publishing select information can be quite complicated, although it depends a great deal on how you're trying to limit the information being replicated and how you go about defining that information.
Limiting Publications to a Single Table The first and most basic selective publication is limiting the publication to a single table. You might have noticed that each table available for replication is listed when you define a given publication. In the first example, the entire database was published by selecting the <all tables> option at the top of the list of tables. The alternative is to select the individual tables that you want to include in the subscription from the Database Tables list. After you've highlighted the different tables you want to include, click the Add button, and they are copied to the Articles list (see Figure 17.18).
FIG. 17.18
You can select one or more individual tables to be included in the publication.
If you do nothing more for this publication, the entire table will be monitored and replicated as part of this publication. You have the same options for managing how the publication will be synchronized and what servers will have access to it as you do when setting up a full-database replication.
The alternative to publishing the entire database table is to partition it. You might recall from earlier sections in this chapter on "Publishing Partial Databases and/or Tables" that you can partition a table in one of two ways: horizontally or vertically. The next section shows how you can control this functionality with your publication.
Partitioning Information to Be Included in a Publication The two partitioning options, horizontal and vertical, correspond to looking at views of your information based on selectively retrieving rows (horizontal) or columns (vertical) from your database tables. Of course, you can combine these two techniques to provide a concise view of the information in the table if needed. To establish a partition of information to be used in the article, click the Edit button on the Edit Publications dialog box's Articles tab. This calls up the Manage Articles dialog box that shows two new tabs, Filters and Scripts (see Figure 17.19).
FIG. 17.19
You can select which columns you want to be included in the replication process.
In the example, as discussed earlier in this section, the Contract column information shouldn't be included in the replicated information that is provided to other sites with this publication. By scrolling down the list of columns, you can select and deselect the different columns that you want to include. When you've completed the selection, you will have vertically partitioned the database if you've deselected any columns because the publication will include only those fields that you allowed.
Normally, the SQL Server replication process manages insert, delete, and update operations automatically. This can be a problem if you're inserting information and need to have another process be kicked off by the action of inserting a new row, for example. In these cases, you can click the Advanced button and specify the Insert, U_pdate, and Delete scripts that you'd like to run when records are modified with these operations.
In addition, if you need to do so, you can provide a table-creation script that you need to use in initializing tables. This script is run instead of the standard table script generated by SQL Server based on the source table's definition. This can be helpful if you're defining tables that will receive replicated data, but perhaps not all columns that need to be in a table.
By default, when the table is created, it is created with columns to match the article's definition. This means that, in the example of not copying the Contract column, the column would not exist at all in the recipients' databases. You can modify the creation script here and make sure all appropriate information and columns are included. Figure 17.20 shows how you can modify these values.
FIG. 17.20
You can override SQL's default methods that are used to implement changes to the
under-lying database tables.
CAUTION: Unless you are very comfortable with modifying these types of SQL scripts, you should consider alternative means of making the changes you need. For example, perhaps you could create a series of triggers on the subscriber that would institute changes needed in the storage of the information so that only certain columns were reused in the end user's system. Remember that changes made here will affect all subscribing systems, and you should ensure that this will not cause unforeseen problems on systems that might vary in their implementation of database structures and table layouts.
You also have the option of controlling the synchronization scripts that will be run by SQL Server. Clicking the Advanced button lets you edit what actions will be taken on synchronization of tables between the publication and subscription systems. The dialog box is shown in Figure 17.21.
FIG. 17.21
You have full control over how indexes and existing tables will be managed during
the synchronization process.
TIP: If there is a chance that the table might already exist on the user's system, be sure to check the DROP Table First option. If you don't, you might have a database table in an unknown state.
If you want to see or edit the actual script that will be created when the table is synchronized, you can click the Auto-Generate button. You are presented with the listing that will be used to create the table when it's initialized.
After you've completed your changes, click OK from the Manage Article dialog box, then Add from the Manage Publications dialog box. The publication is added to the list of available publications, and you're ready to allow other users to start accessing the publications on your system.
You might notice in Figure 17.22 that the AllPubs publication shows a secure publication, which is one with additional limits on who can see and use the information. The key that is showing over the book icon next to the publication name indicates this. On the NonSecureAuthorsOnly publication, there is no key, indicating that the publication is open to subscription by any subscription server with access and overall replication authority on your server.
FIG. 17.22
The new publication and articles are added to the list of active publi-cations, ready
for a subscriber to begin receiving the replicated information.
In the next section, setting up subscriptions on the recipient side of the equation is discussed. Now that you've set up the publications, it's a straightforward process to indicate which publications any given subscription server needs to begin receiving replication updates for.
After the publications have been set up on the publication server, it's a simple task to set up the subscription server to begin receiving the information. Before you can subscribe to a given article, you must follow the same installation steps outlined earlier in this chapter. These steps set up the distribution database and prepare your system to begin working with the replicated information. The only difference is that, when it comes to specifying which databases are visible to the replication process, you'll specify which ones can be the recipients of information rather than the source of the information.
See "Installing SQL Server's Replication Services," Chapter 17.
When you're ready to initiate subscriptions, select Manage, Replication, Subscriptions from the menu in Enterprise Manager. As shown in Figure 17.23, this brings up a dialog box very similar to the Manage Publications dialog box used to define and create publications.
FIG. 17.23
Select the publication that you want to subscribe to from the list of available items
for the server and database in which you are interested.
You can drill down into any given publication to see what articles are available and whether you're already subscribed to them. After you've found the publication you want to subscribe to, highlight it and select Subscribe. You can specify where you want the information to go on your local system after it begins arriving from the replication server (see Figure 17.24).
FIG. 17.24
When you select a publication, you need to indicate where you want to store the incoming
information on your server.
You might notice that there are three different options available to you for synchronization of the database. The last option is to turn off the synchronization altogether. This means that you know for a fact that the database and the concerned tables are in sync with the master system by some other means.
CAUTION: If you are unsure at all about the validity of the information in the database, it's best to allow the system to perform an initial synchronization on the database.
The first option enables you to let the system do the work for you and do the synchronization of information over the standard network connection between this system and the publication server. If you're on a locally attached LAN or a fast connection that is reliable, this is the best option to select and is the default.
If you're on a slower link, it might be that you've opted to bring the update to the system by another means. This includes copying the initial synchronization files from the publication server to tape or disk and then physically bringing them to the subscription system. If you have a slow link and initially populated tables, this might be a good option, as it avoids the initial rush of information across an already slow connection between the two systems.
TIP: Remember, you can schedule the time at which synchronization takes place. Moving it to off-hours can ease network loading significantly.
After you've set up the subscription to the publication server, you might notice that the display of subscriptions is updated to show the active subscription. You can see in Figure 17.25 that, although the subscription is active, it has not yet been synchronized with the publication server.
FIG. 17.25
The list of subscriptions changes to indicate that you're now subscribed to the publication.
It also indicates the destination database.
There might be times when you need to uninstall replication, either on a server-specific basis or on an overall basis. Examples include a server changing physical locations and no longer needing access to sensitive information, or when you've installed several test scenarios and simply want to start over.
The next two sections show how you can cover both of these types of situations using facilities within SQL Server.
If you find that you need to remove a server from the list of servers that are eligible to receive information from your publication server, you need to determine whether you need to restrict only individual publications or if you need to revoke access altogether. If your intent is to remove access completely, you simply remove the server from the list of eligible candidates that can see the publications on this server.
From the Server menu, select Re_plication, Publishing. You can indicate which servers are able to subscribe to your server. Figure 17.26 shows the server selection dialog box that appears.
FIG. 17.26
Deselect any servers you no longer want to make publications available to.
You are presented with a list of subscriptions that are about to be removed and asked to confirm that you want to remove the server from the list of eligible systems. After you confirm, the server still shows up on the list, but the server will not be able to select publications on the publication server.
Carefully consider all options before you opt to remove replication entirely from the system. This operation shuts down all replication to all subscribing servers, and it does so in a somewhat brute force manner by removing the replication option from the server and deleting the distribution database.
There are three steps for uninstalling replication in SQL Server, as follows:
You need to follow these steps to restore your system to its state prior to installing replication. Removing connections to other servers is covered in the section immediately preceding this one: "Disallowing Specific Servers." If you're removing replication from a subscription server, select Server, Re_plication, Subscriptions instead of Publications; the operations of removing references to foreign servers are the same.
The next step requires issuing an interactive SQL command, so you need to use ISQL/W or another utility to enter the command. Enter the following stored procedure call to turn off the replication option on your system:
sp_ServerOption "<servername>", "dist", false
NOTE: You must be logged in as SA to complete this option.
Replace <servername> with the name of your server. In our example, because our server name is PLUTO, the command would be:
sp_ServerOption "pluto", "dist", false
NOTE: No results are returned from the call.
This removes the distribution option from your system, which disables replication. The distribution database you established when you first installed replication still remains in the system. Dropping this database is the final step in removing replication from your system.
From the Enterprise Manager, right-click the table that you specified.
CAUTION: Be sure you're selecting the distribution database you specified, not your databases that contain active, important production information. If you have any doubt about which database to select, you can review the stored procedures for the database, and you can see the replication stored procedures there.If you select an incorrect database, you permanently drop production information, and your only recourse is to restore from a backup.
From the menu, select Drop Database and confirm that you want to drop the database. If all has gone well, you can make a quick check by selecting Server, Re_plication from the menus. You should have the Install Replication option available once again.
Probably one of the most often encountered problems with a replication installation is that, when SQL Server was installed, the administrator account was used for the Executive service. Later, when the password is changed on the Administrator account, the replication services stop working, seemingly without cause.
The fact that the SQL Executive must be able to log in, and that it's set up with the Administrator account when you run Setup, makes it easy to forget that the account and the process are linked. You can change the account used by the process by accessing the Control Panel and selecting Services (see Figure 17.27).
FIG. 17.27
To update the account used by the SQL Exe-cutive service, select the service and
choose SETUP.
Select the SQL Executive Service and select Setup. When you do, you can establish the various startup options that are used when the NT Server is started. You should consider either assigning an account specific to the SQL Executive service, or you can set up the process to use a system account, preventing future problems with user names and passwords. See Figure 17.28 for an example.
FIG. 17.28
It's a good idea to establish an NT account specifically for the job of supporting
the SQL Executive process.
By setting up the Executive in this manner, you are preventing the password change problems in the future. Also, make sure you set up the account with the Password Never Expires option.
One performance area that has helped in implementations has been to use stored procedures in the replication process. Remember, you can indicate the SQL statement that is used to divide up the information that you work with in the replication process by using a stored procedure in place of the SQL statement. You can take advantage of the fact that SQL Server compiles stored procedures making them faster and more efficient.
Replication is a powerful tool, especially in distributed environments. By putting replication into your systems, you can provide an excellent level of usability for applications, a great security layer for the underlying database tables, and better overall data availability.
For additional information about the topics covered in this chapter, please refer to the following chapters:
© Copyright, Macmillan Computer Publishing. All rights reserved.