Table of Contents

Chapter 15

Implementing Replication Scenarios

Certification Objectives *

Planning Distributed Data Scenarios *

Is Replication the Appropriate Tool? *

What server will hold the master copy of the data? *

What data needs to be replicated? *

What servers need to receive the data? *

How often will the data be replicated? *

Two-phase Commit Decisions *

Replication Models *

Single-Publisher *

Multi-Subscriber *

Single Subscriber *

Multi-Publisher *

Remote Distribution Server *

From the Classroom *

Internet Security *

Configuring Your Replication Environment *

From the Classroom *

Testing The Replication Process *

Network Configuration *

Number of Subscribers *

Frequency of Replication *

Certification Objectives

Planning Distributed Data Scenarios

In Chapter 14 you learned about the complex technology that allows data replication to work properly in Microsoft SQL Server. You also learned that Microsoft SQL Server provides multiple technologies that can be tailored to your application’s specific requirements.

Each technology produces different benefits and restrictions, depending on your requirements. Three requirements factor in three important dimensions:

The way business requirements fit with and across these three dimensions will determine the appropriate technology to implement. In Chapter 15 we will review the various replication models, and demonstrate how they relate to various business requirements. First we will understand how distributed data environments relate to business requirements. Then we will discuss the choices that must be made working with most production systems. In production environments it is likely that the business requirements of one dimension will conflict with another dimension.

For example, it is commonly accepted that an application must use the two-phase commit update protocol to guarantee full transactional consistency and transactional atomicity. This process also needs to occur without conflicts when modifying the same data at multiple locations. One requirement, however, for two-phase commit is fully available and reliable communication between all participating sites. This makes the use of two-phase commit inappropriate for many distributed applications because not all sites can be "well-connected" to each other.

After reading the previous chapter you now have a solid understanding of the technology that makes Microsoft SQL Server function properly. Understanding the technology is only half the battle to a successful implementation. The other half of the battle involves understanding the relationship between this distributed database technology and your business requirements. Your business requirements will drive your selection of a distributed database technology. At the end of this chapter you will be to select the appropriate solution to meet these needs and provide the best solution possible.

Planning a Replication Scenario

Planning a replication model is very similar to creating a logical database design. It is necessary to have a working model before you can implement replication successfully. Many of the performance problems you will run into while working with replication will be the result of poor or insufficient design work up front. When you encounter the need for a distributed database solution, you need to consider the following issues prior to doing any configuration work.

Is Replication the Appropriate Tool?

Replication can solve many distributed data situations, but is not the answer to all problems. You should evaluate an alternate solution if your system matches any of the requirements listed below. As we mentioned above, distributed data decisions are often tradeoffs between opposing business requirements.

What server will hold the master copy of the data?

When evaluating where the data should be replicated from, it is important to determine where the best location is to make updates. To make this decision, you must balance the combination of best user experience (performance) and locations that can be properly administered. This machine will be the publisher. (Q1)

What data needs to be replicated?

When distributing data, subscribers typically only need a subset of the data that is available on the publisher. For performance and security reasons, it is important to only replicate data that is needed. Other data should be filtered out (horizontally and/or vertically).

What servers need to receive the data?

What are the requirements of the servers that will be receiving distributed data? Is there currently a continuous network connection in place? If so, how much available bandwidth is in place? What portion of the data in the publisher is needed? Is read-only access sufficient to meet the needs of the users of this server? If users need to edit the data, or all data in the publishing database is needed, we may choose to use a different solution. If we do choose replication, this information will definitely impact the configuration that we choose.

How often will the data be replicated?

It is important to know how current the information needs to be for a particular distributed data site. The database might support a Decision Support System or a summary reporting system. If this is the case, then frequent changes will have a negative impact as reports could not be reconciled against each other over a period of time. Conversely, if the site is used to quote current stock prices or monitor very volatile inventory, (such as concert tickets) it will be critical that data is replicated immediately. If data cannot be replicated immediately, other solutions should be considered.

Two-phase Commit Decisions

The two-phase commit protocol is a DB-Library tool that provides a solution for distributed database needs. As we discussed in the previous chapter, the greatest benefit of two-phase commit is immediate guaranteed consistency. This makes it an appropriate solution for environments that require multiple databases be updated simultaneously.

There are multiple business requirements that could lead us to a two-phase commit solution. Here are some of the benefits two-phase commit can offer.

When we look at Figure 15-1, we see an example of how two-phase commit allows us to verify that a sales transaction is recorded everywhere that is necessary. The client records a sales transaction to the sales database server that functions as the Transaction Coordinator. The Transaction Coordinator communicates with the Resource Coordinators for each database server to verify that databases get updated properly. This single transaction implements the following: it reduces the existing inventory in the inventory database, adds units to the manufacturing process, and generates an entry into the accounts receivable database.

fig15-01.gif (8901 bytes)

Figure 1:Two-phase Connect in Production

The greatest disadvantages of two-phase commit come from the limitations it places on each database’s availability for update. As the name implies, there are two steps to a two-phase commit transaction. In the first phase of each transaction, the Transaction Manager verifies that all Resource Managers are available and willing to apply the update. In the second phase, the Transaction Manager instructs all Resource Managers to commit the transaction.

If the Transaction Manager does not receive a positive response from all Resource Managers during phase 1 and phase 2, the transaction is rolled back. This is increasingly significant if one or more of your databases are remote. Take a look back at the example in Figure 15-1. Certain constructs may apply if the manufacturing database is located in the manufacturing facility halfway around the world while all other databases are located at the corporate office. If this is so, we may get some unwanted consequences from two-phase commit. For instance, we would be unable to record sales anytime the connection between the corporate office and the manufacturing facility becomes unavailable. This would occur even if the items being sold are available in inventory. This is particularly undesirable since entries to the manufacturing database are not processed the same day they are entered.

Replication Models

As we discuss the different models that are available with Microsoft SQL Server, it is important to remember certain things. One issue is that the minimum requirements for replication consist of one publisher, one distributor, and one subscriber.

Exam Watch. While certification requires that a publisher, a distributor, and a subscriber all be present for replication to occur, a single server with multiple databases can be configured to perform all three roles. This type of configuration provides a cost-effective way for you to get the experience you need to properly prepare for the certification exam.

Single-Publisher

The basic replication scenario is based around a single-publisher environment. In prior versions of SQL Server, this was referred to as ‘central publisher’ replication. As the name implies, single-publisher replication configuration contains a single publisher that provides data to any number of subscribers. The publication server is the primary owner of the replicated data. As such, the publisher is typically the only database where data can be edited. The subscription servers will usually treat this as read-only data.

You can see an example of a single-publisher scenario in Figure 15-2. The accounts receivable database is edited in the Manchester office. The data is replicated to the other offices is for read-only use. If a new office is opened, it subscribes to the existing publications and immediately has current resource scheduling data.

Exam Watch: In Chapter 14 we mentioned that running a SQL Server database in read-only mode provides a significant performance improvement. When using this technique it is critical to understand that the database cannot be in read-only mode during the replication period. For this reason, it works best with snapshot replication.

fig15-02.gif (6720 bytes)

Figure 2: A Single Publisher Scenario

Multi-Subscriber

In a multi-subscriber scenario there are one or more publishers replicating data to multiple subscribers. As the number of subscribers increase, the load on the distributor increases significantly. This occurs because the distributor must track which subscribers have received each article.

In Figure 15-2 you saw an example of a multi-subscriber scenario where there is a single publisher. In Figure 15-3, you see an example of a multi-subscriber scenario that includes more than one publisher. The Manchester database is publishing the accounts receivable database, while the Redmond database is publishing the human resources database. The subscribing databases receive data from both locations.

fig15-03.gif (10100 bytes)

Figure 3: A Multi-Subscriber Scenario

Single Subscriber

As we discussed earlier, the simplest form of replication includes one publisher, one distributor, and one subscriber. This is one example of the single subscriber replication scenario. A single subscriber scenario describes any replication scenario in which there is one subscriber and one or more publishers. In Figure 15-4, you see an example of a single publisher and a single subscriber. The Manchester database replicates accounts receivable information to the Jacksonville subscriber.

fig15-04.gif (3802 bytes)

Figure 4: A Single Subscriber Scenario with a Single Publisher.

In Figure 15-5, you see an example of a multiple-publisher and single subscriber scenario. The Manchester database replicates accounts receivable data to the Jacksonville subscriber. The Redmond database replicates the human resources data to the Jacksonville subscriber.

fig15-05.gif (6462 bytes)

Figure 5: A Single Subscriber Scenario with Multiple Publishers.

Multi-Publisher

In a multi-publisher scenario there are multiple publishers replicating data to one or more subscribers. As the number of publishers increase, the load on the distributor also increases. The distributor must store and track data from all publishers. In Figure 15-5, you saw an example of a multi-publisher, single subscriber scenario. This is the simplest multi-publisher environment. In figure 15-3, you saw an example of a multi-publisher, multi-subscriber scenario that is more typical for distributed enterprises.

Remote Distribution Server

In all of the scenarios that we have studied so far, the distributor has been located on the publisher. In many production scenarios, the distributor function will be moved to a dedicated server. This may be driven by server performance or by network optimization.

As the volume of data being published and/or the number of subscribers continue to increase, the system resources being consumed by the distributor will impact the performance of the server’s other functions. When this occurs, the distributor will be moved to another server. In Figure 15-6, you see the simplest replication scenario. The Manchester server publishes data to a dedicated distributor that passes it to a single subscriber.

fig15-06.gif (5004 bytes)

Figure 6: Replication through a Remote Distributor.

A second reason for moving the distribution database to a dedicated server is network optimization. If the company in our example decides to expand their Jacksonville operations and open two more offices that will each house subscribers, we would need to reassess our configuration. In Figure 15-7, we see the network configuration. There is a single 256K connection between Manchester and Jacksonville. Between the three Jacksonville offices we have T1 connections.

fig15-07.gif (8567 bytes)

Figure 7: Network Diagram.

Given this network configuration, it should make sense that we would move the distributor to the Jacksonville District Office. By doing this, our replication data only passes through the 256K WAN link one time. If we place the distributor in the Manchester office, the data must pass through the 256K link three times—once for each Jacksonville Subscriber. You can see our recommended configuration in Figure 15-8. It is important to note that the distributor can exist on the same server as the subscriber or on a dedicated server as required for performance.

fig15-08.gif (10211 bytes)

Figure 8: Using a Remote Distributor to Avoid Network Bottlenecks.

Implementing An Anonymous Subscription

In SQL Server 7.0, Microsoft has introduced a new concept called anonymous subscribers. Anonymous subscribers are a unique type of pull subscribers that can be used to reduce the overhead that replication normally produces. With standard replication, information is stored at the Publisher about each Subscriber. In addition, performance information about each Subscriber is kept at the Distributor. Anonymous subscription reduces this overhead by not storing this detailed information.

As you would expect, there are a number of functional limitations that come with this reduced overhead. By not having the publisher track anonymous subscribers, the effectiveness of audit logs and security are reduced. In addition, the lack of subscriber information at the publisher makes it more difficult to perform performance tuning and optimization of the replication process.

It is the publisher that determines whether it will support anonymous subscribers for a given publication. When running the Create Publication wizard, the administrator will receive the prompt shown in Figure 15-9. Once the publisher has made anonymous subscriptions available, it is the subscriber that must create the anonymous subscription. The subscriber also carries all responsibility for keeping the anonymous subscription synchronized.

Figure 9: Supporting Anonymous Subscribers via the Publication Wizard

Anonymous Subscriptions are typically used in environments where you have a large number of subscribers and have a desire to reduce the corresponding overhead. This reduced overhead comes with a tradeoff. If you enable anonymous replication, you are declaring the given publication to be public in the scope of users that have network access to the server. This is particularly important to understand if you are using anonymous subscription to support Internet Subscribers.

An anonymous subscription is created by performing the same steps that are used to create a standard pull subscription. If the publication is enabled for anonymous subscriptions and the subscribing server is not registered at the Publisher, then the Pull Subscription Wizard will create an anonymous subscription.

From the Classroom

Internet Security

If you are using SQL Server as a back-end for an Internet application, you need to isolate your SQL Server as much as possible from the Internet. Depending on your application, you could use SQL Server Replication to create a level of indirection between the Internet server and your production servers.

—David Smith, MCSE + Internet

Implementing Multimaster Replication

Within the database industry, the term multimaster is used to describe a replication environment that allows the same data to be modified in more than one location. In relating this to SQL Server’s metaphor, it is the Publisher that holds the master copy of the data that is being replicated. As we have discussed, the publisher is typically the only server that can modify the data. With the introduction of merge replication to SQL Server, it is now possible to modify the data in more than one location, therefore supporting the multiple master concept.

When we take a closer look, the interchangeability of the terms Master and publisher breaks down. While the addition of merge replication in SQL Server 7.0 provides the ability to update data in multiple locations, it does not create multiple publishers. Instead, there continues to be only one publisher per publication. SQL Server version 7.0 replication does not support true "multimaster" replication. Instead it does support the editing of an article from more than one location (via merge replication).

Exam Watch: In both a multi-publisher and a multimaster model, data is being edited on multiple servers, but they are different configurations. A multimaster environment has multiple servers (one publisher and one or more subscribers) that may edit the same data. In a multi-publisher environment, there are multiple servers editing different data.

Implementing a Central Subscriber (Roll-Up Server)

In many production environments you will find that business requirements dictate that local offices have full site autonomy. These users must be able to edit data in the local office. The requirements may be complicated by the following fact: managers in a central office environment must have access to this data for decision support systems (DSS) and/or for consolidated report generation.

In this situation, you will need to implement an environment that we typically call a central subscriber, or roll-up server. Each office will have a publisher that manages all the data that is controlled within that office. A SQL Server in the corporate office will be a subscriber to each publisher and as a result receives all data updates from all locations and maintains a consolidated copy of the cumulative data.

In figure 15-10 you see an example of a central subscriber environment. During the latest reorganization (all companies have them) it was determined that the field offices need to manage their own accounts receivable information. The management in the corporate office argued that this would not work because they needed the information that was stored in their central database for financial reporting and decision making.. The database administrator was able to meet the needs of all users by designing the environment you see in figure 15-10.

fig15-10.gif (11323 bytes)

Figure 10: Central Subscriber Scenario

In Figure 15-10 you see that each local office now maintains its own master accounts receivable database. These databases have been configured to publish their data. The Manchester Corporate Office subscribes to each office to obtain that data automatically. In the corporate office they continue to have the information necessary to generate reports and perform decision support. It is important to note that the data in the corporate database is read-only, as it is all received from replication.

Configuring Your Replication Environment

Now you should understand both the concepts of replication and how they apply to Microsoft SQL Server, as well as and the business problem that you are trying to solve. The next step is to get a better understanding of the physical environment and the details of the requirements. Once you understand the model that best meets your users' needs, you must determine where to locate servers, how to connect them, and the detailed configuration of their replication. When you have this information, you will be prepared to estimate the size of databases and transaction logs before beginning your implementation. Let’s take a look at the key issues that will impact your database design.

From the Classroom

Testing The Replication Process

It is important that you work with replication in a test environment before you set up a production server, or write the Microsoft SQL Server Exam. If you are like most people you do not have two or three NT Servers at home or in a lab at the office. Here are two ideas that can allow you to set up replication with the minimal number of computer systems.

—David Smith, MCSE + Internet

Network Configuration

In is important to know what network connectivity you will have between locations. Your design might place multiple servers on the other side of a slow network connection from the publisher. If this is the case, you may place a distributor on the remote side of this link to reduce the data that must be carried across it.

It is also important to understand the traffic patterns of the network. You may have a Subscriber that is connected to the Distributor via a WAN connection that is heavily used during business hours but completely available during non-business hours. When you combine this information with the fact that the Subscriber does not require 100 percent current information, you may elect to replicate once per day after hours.

For example, contrast the requirements for a distributed application supporting a company’s distributed sales force with a distributed application supporting the same company’s accounts receivable department. Both applications use replication for distributing data. The company’s traveling sales force, however, must be able to take customer orders on laptop computers that are disconnected from the central office network.

Number of Subscribers

Before you can finalize your configuration, it is important to have a good estimate of how many subscribers there will be. This information will help in two ways. It will help you to determine the load that will be placed on a publishing server. It will also help you to estimate more accurately the space that will need to be allocated for the transaction log.

Frequency of Replication

Estimating the frequency of replication will help you to determine the following:

Certification Summary

In this chapter we explained the configuration scenarios that are available when implementing the replication technology available in Microsoft SQL Server version 7.0. We showed the simplest form of replication, which is a single publisher and single subscriber configuration. From there we showed you configurations with multiple publishers, multiple subscribers, remote distributors and combinations of these. During the chapter you learned that complex environments all build on the basic replication configuration.

You also saw how business requirements, hardware availability, and network topology combine to determine the best configuration for production. You might have users in 15 locations worldwide that require frequent access to large amounts of relatively static information. If so, it is likely that we will replicate this information to their local area network, rather than maintaining a single database and placing the full user load on the wide area network. While either configuration will meet the business needs, they will result in completely different user experiences and satisfaction. You also saw that if you need to replicate to multiple servers in a single remote location, a remote distributor can help you to minimize the traffic over the WAN. It can also minimize the impact of replication on the network’s performance.

Finally, we showed how replication can support users with differing requirements. We saw a very good example of this when we discussed the roll-up server. We provided local databases with full access to a subset of the total data, therefore meeting the needs of the accounts receivable representatives in each field office. We used replication to also provide a central subscriber database with read-only access to all data. This meets the requirements of managers with decision support systems and reporting requirements.

Two-Minute Drill