Yesterday you learned how to set up automated tasks and alerts, as well as have an operator notified by an alert.
Today's lesson focuses on one of the more interesting SQL Server abilities--the ability to replicate or copy data to other servers. You learn the skills necessary to setup, monitor, and configure data distribution.
Data distribution, replication, allows data and changes made to one database to automatically be applied to another database. Replication can be set up to copy data from one database to another on the same SQL Server, or it can be configured to update a SQL Server across the wide area network.
Microsoft has added the capability of replicating data from one database to another in SQL Server 6.5. Replication made its appearance in SQL Server 6.0, and was one of the features most requested in SQL Server 4.21.
There are two basic ways to synchronize transactions across databases.
SQL Server has the ability to replicate data to ODBC-compliant database engines. Examples of ODBC compliant database engines include Access, Foxpro, Oracle, Sybase, and DB2. If you replicate data from SQL Server 6.5 to SQL Server 6.0, the 6.0 server is also treated as an ODBC database, instead of a native SQL Server 6.5 database.
Replication is transaction log-based, which means that only changes to the database--not the entire database--are sent across the network after an initial synchronization.
NOTE: For most replication situations, replication should be designed as one-way replication. Setting up two-way replication is relatively complicated and beyond the scope of this book. Refer the BackOffice Resource Kit (Volume 2) or TechNet for more information on two-way replication.
There are various reasons that a company may have to replicate data from one SQL Server to another one. Some common reasons include:
Before SQL Server 6.0, which introduced data replication, each of these scenarios would be cause for many custom scripts and automated tasks in order to get data where you needed it. Because data replication is now built into SQL Server and Enterprise Manager, it is relatively easy to set up and configure.
WARNING: Installing replication, while relatively easy to do from Enterprise Manager, dramatically changes the way your database's work. You should carefully plan your needs, scenario, and design before implementing replication on production servers.
Microsoft has adopted the use of publishing terminology when dealing with replication. Here are the most common terms when dealing with replication:
Publisher
The SQL Server, and more precisely, the database that holds the master copy of the data. Data is added, edited, or deleted from the publishing database, and then the replication process copies those changes to all the subscribing databases.
Distributor
The SQL Server, and more precisely, the database in charge of reading data and changes from the publishing database, storing it, and eventually forwarding the data and changes to one or more subscribing SQL Servers, and thus, subscribing database. The distribution server is the workhorse of the replication process.
Local Distributor
When the publishing and distribution servers are running on the same SQL Server, it is considered a local distribution server. Advantages of local distribution servers are discussed later in this chapter.
Remote Distributor
When the publishing and distribution servers are separate SQL Servers, it is considered a remote distribution server. Advantages of remote distribution servers are discussed later in this chapter.
Subscriber
The SQL Server, and more precisely, the database that receives data and changes from the distributing (and thus, publishing) database. Subscribing databases should be considered read-only databases, as changes made to subscribing databases are not replicated to the publication databases, just as doing the crossword puzzle in the back of the local newspaper does not affect the crossword puzzle the next night.
Push Subscription
Subscriptions can be initiated from the publishing server if the administrator of the publishing server has SA permissions on the subscribing server. This is called a push subscription. For example, if you start receiving a magazine that you didn't subscribe to, it would be considered a push subscription, because it was pushed to you without any intervention on your part.
Pull Subscription
Subscriptions that are initiated by the subscription server are considered pull subscriptions. For example, if you fill out a subscription card for a new magazine, it is considered a pull subscription, because you (the subscriber) are initiating the subscription.
Article
One or more tables of a database that are published, thus making them available to subscribing servers. The article is the basic unit of replication, although an article must be part of a publication. An article contains all columns and all rows of a particular table by default, but you can restrict the rows (horizontal partitioning) and the columns (vertical partitioning) that are replicated.
Publication
A collection containing one or more articles.
Horizontal Partitioning
When certain rows are selected for replication, instead of every row in a particular table. For example, headquarters may replicate only those customers located in California to the California subscription server, those located in Nevada to the Nevada subscription server, and so on.
Vertical Partitioning
When only certain columns are selected for replication, instead of every column of a table. For example, headquarters may want the branch offices to get a copy of the payroll database, but only the columns that deal with names and phone numbers, not the columns on salary or birth date. Horizontal and vertical partitioning can be combined.
NOTE: The distribution server needs to have at least 16MB of RAM configured for SQL Server (8,192 2KB units) in order for installation to work. Of course, additional RAM should almost certainly be configured for SQL Server, in order to increase performance. The distribution and publication servers should also be the same version and same service pack level as each other.
Although there are many aspects to replication, the major steps can be summarized as:
Each of these processes is discussed and demonstrated later in this chapter.
Deciding which SQL Servers will be publishing servers, and which will be subscription servers, is probably the most important decision you face when designing your replication environment. There are several generic scenarios that your environment may fall into.
The simplest replication scenario has a single publishing server sending one or more publications to a single subscriber (see Figure 19.1).
Figure 19.1. The single publisher to single subscriber model.
There are two replication servers--one for the original data (the publishing server), and one to receive data (the subscribing server)--in the single publisher to single subscriber model. Replication is often used to keep two copies of a database for performance or convenience.
For example, data from a SQL Server database residing on the West coast may be replicated nightly to a SQL Server on the East coast, so that the WAN does not have to be used by the East coast office to run queries.
Configuring multiple subscribers to a single publisher is very similar to a single subscriber (see Figure 19.2).
Figure 19.2. The single publisher to multiple subscriber model. There are various reasons to set up multiple subscribers to the same publisher.
In order to set up multiple subscribers, you need to do the following steps. They are similar to the single publisher/single subscriber model.
Notice that the single publisher/single subscriber model can easily be modified to the single publisher/multiple subscriber by choosing Server | Replication Configuration | Pub-lishing and enabling subscribing to additional servers.
Another model for replication is multiple publishers updating a single subscriber (see Figure 19.3).
Figure 19.3. The multiple publisher to single subscriber model.
This model is useful when:
This model is very useful for those environments where relatively independent branches operate, and whose data is periodically requested by a central office. If the central office can do with data that is up to 24 hours old, this model is excellent for scheduling replication at night, thus copying that day's transactions from the branch SQL Servers to the central SQL Server. Any reports done at the central office contain data from the branch offices that is current up to the end of the previous day.
Because data in the subscribing server is originating from multiple servers, you need to have a column in your publication tables that is location-specific. For example, if you have five branch offices, you need to have a column for branch number. All data from branch one has a 01 in the column, all data from the second branch has a 02 in the column, and so on. The unique identifier is not only used by SQL Server, it lets the central database know from where the updates came.
Any changes that the central office wants made cannot be done to its copy of the database, because it is a subscriber and has read-only data. Changes that the central office requests must take place at the branch (publishing) offices.
Follow these steps to enable multiple subscribers to a single publisher:
The multiple publisher to multiple subscriber model is when a server is both publishing and subscribing to the same database, although special stored procedures ensure that any given row originates from only a single, specific publishing server (see Figure 19.4).
Figure 19.4. The multiple publisher to multiple subscriber model.
This is the only replication scenario that acts like two-way replication, as the data appears to go in a round robin format, although individual rows still only originate from their respective publishing server.
Only general setup guidelines are given because configuring multiple publishers to multiple subscribers requires custom scripts, which may be unique for your situation.
NOTE: If multiple publishers and subscribers are replicating different tables (even in the same database) as each other, they act like single publishers to multiple subscribers. It's only when you have multiple publishers and subscribers to the same table within a database that you have to be very careful to avoid sending the same data around, and around, and around...
Multiple publishers need a column designated to hold site-specific information, just as in a multiple publishers to single subscriber model.
The multiple publisher to multiple subscriber model is useful for more peer-to-peer models, where there no central database either initiating updates or being updated.
Installing publishing sets up support for replication, and is a one-time process that is done on the publishing server. When publishing is installed, it does four major things:
Installation of publishing support can only be done by the SA. Enable publishing support by going to Server | Replication Configuration | Install Publishing (see Figure 19.5).
Figure 19.5. Installing publication.
When the installation of publishing has been started, you can select the distribution server, whether it be local or remote (see Figure 19.6).
Figure 19.6. Installing Publication options.
TIP: Because the distribution server does most of the work in the replication process, it needs to be a fairly fast SQL Server, with unused CPU, RAM, and hard disk capacity. That said, creating a local distribution server is much easier than creating a remote distribution server, and is recommended unless you are fairly certain that a remote distribution server better fits your needs.
If a local distribution server is selected, you are prompted for the database devices on which you want the distribution database to be created. You can specify existing devices, or select <new> to create the devices. Although the default name of the database is distribution, you can specify a different name. The devices and database are created after you select OK, although it may take a minute or so. Choose OK to configure the publication server.
Picking a size for the distribution server is definitely more of an art than a science, but there are a few things to consider. Because the distribution database temporarily holds a copy of the transactions of the publication database, the busier the publication database is, the larger you should make the distribution database. By default, the distribution database is cleaned every 24 hours, so it has to only be large enough to hold 24 hours worth of transactions. For example, suppose you have a 600MB database that gets about 5MB of transactions per day (you obtain this number by watching the space used by the log). The distribution database would have to be at least 5MB to hold one day's worth of data. If you increase the holding time to 72 hours, the distribution database has to be at least 15MB in size. Whatever size you make the distribution database, you should put the distribution database's transaction log on a separate device than the data, and make the log approximately 25 percent the size of the database.
The next screen lets you specify which SQL Servers are allowed to be subscribing servers, and which databases are allowed to publish tables (see Figure 19.7). Enable publication to your own server, and enable the pubs database to be a publication database. These screens are not well-labeled. What you are doing is specifying which SQL Servers can be recipients (subscribers) to this server. In addition, by placing a check next to a database, you are labeling it as a potential source database. You have to go to that database and create articles and publications even after you select a database as a source.
Figure 19.7. Publication configurations.
You define subscription server configurations and the destination database(s) at a later time.
You can select subscription servers from those already registered in Enterprise Manager, or new subscription servers can be registered by selecting New Subscriber.
Every database that you want to configure as a potential publishing database (articles must be created later) must be specified by adding a check to the Enabled box of the Database screen, as shown in Figure 19.7.
If Distribution options is selected, you can specify when and how the data is distributed to subscription servers (see Figure 19.8). There are two major ways to specify how data is to be sent to subscription servers.
Figure 19.8. Selecting Distribution options.
Transaction-based means that the distribution server keeps track of all of the transactions that occur on the publishing database, and sends those to the subscribing servers in batch sizes that you specify. The default batch size is 100, which means that up to 100 transactions at a time are recorded by the distribution database and sent to subscribing servers.
TIP: Time-based allows you to specify replication to take place at a certain time. This is very useful for databases that need to be synchronized during off hours, but don't need real-time synchronization. Specify a time when network traffic is relatively light.
You can set different distribution schedules for different subscription servers, making replication traffic very customizable. For example, if you had 20 different servers you were replicating to every night, you could set the synchronization start time for each one approximately 10 minutes apart, thus lowering the chance of flooding the network with replication traffic.
Selecting Distribution Publishers allows you to specify which SQL Servers (if any) can use this distribution server as a remote distribution server. For example, if your replication environment included SERVER2 as the publication server, and SERVER1 as the remote distribution server, from this screen on SERVER1 you must allow SERVER2 to use this server by placing a check next to SERVER2.
Choosing OK saves your distribution configuration. You should probably take the default distribution configurations, unless you have special replication requirements.
Articles and subscription servers need to be configured after publication can been successfully installed.
You need to configure subscription servers, as shown in Figure 19.9. Open the Subscription Server Configuration window by choosing Tools | Replication | Configuration | Sub-scribing. Enable the servers that you want to be subscribing servers, as well as which databases should be subscribing databases. Servers are unable to subscribe to your publications until they have been configured as potential subscription servers, and then only those databases that have been selected as potential subscribing databases can actually subscribe to publications.
Figure 19.9. Configuring Subscription options.
Creating an article consists of selecting a database that has previously been enabled as a publishing database, and selecting a table within that database to publish.
NOTE: You can only publish tables that have primary keys. Unique indexes do not qualify as primary keys--you have to specifically designate one or more columns as the primary key. Only tables with primary keys show up in the Create/Edit Publications screen.
In order to create a publication select either of the following choices:
The Manage Publications screen opens, and all of the databases that have been enabled for publication are shown, even if they don't have any publications defined (see Figure 19.10).
Figure 19.10. Setting Publication options.
Select New to create a new publication, which takes you to a list of tables for that database. Enter a name for the publication, and then begin creating one or more articles for the publication (see Figure 19.11). You have to use standard SQL Server naming conventions (no spaces, hyphens, maximum of 30 characters, and so on).
To create an article, simply highlight a table and select Add. The article includes all columns and all rows for that table by default.
Selected columns can be published by doing a vertical partitioning, which simply allows you to specify which columns will be replicated (see Figure 19.12).
Figure 19.11. Creating publications.
Figure 19.12. Vertical partitioning.
For example, you may have an employee table that contains names, ages, and phone numbers. You may want only the names and phone numbers replicated, so a vertical partition has to be created. Select Edit to set up a vertical partition; the Filter tab allows you to choose columns by removing checkmarks from the Replicate column (see Figure 19.13). To follow this example (and keep the phone and address columns from replicating), remove the checkmark from the Phone and Address columns.
If you don't want every row replicated, you can set up a horizontal partition, which restricts which rows are sent out. To enable horizontal partitioning, simply enter the matching criteria in the Restrictive Clause box. For example, if you only wanted rows that were from the state of Washington, add state = 'WA' in the box. The syntax is the same as that of the where portion of a select statement.
Figure 19.13. Setting up vertical partitioning.
The Use Column Names in SQL Statements is an option you probably want to enable. It enables replication based on column names, instead of assuming that the columns in both the publishing and subscribing servers are in the same order.
By selecting the Synchronization tab of the Create/Edit Publications screen, you can specify how the initial synchronization takes place (see Figure 19.14).
Figure 19.14. Publication Synchronization options.
Notice that the default is for bcp to use native mode and to check every five minutes. Native mode bcp fails if a subscribing server is DB2, Access, or Oracle. For more information on bcp, please see Day 9, "Importing and Exporting Data."
TIP: You may want to set the interval to every one minute to make initial synchronization happen faster when you are experimenting with replication. On production servers, the default of five minutes is probably a better option. If you leave it at five-minute intervals, it takes up to five minutes to do the initial synchronization (and confirm that you did everything successfully) when you set up your subscriptions.
The Security tab allows you to specify which subscription servers can subscribe to this publication (see Figure 19.15). This allows you to fine-tune your replication environment's security, as by default all SQL Servers that have been designated as subscription servers can subscribe to all publications. By going to the Security tab and designating only those subscription servers that actually subscribe to this publication, you lesson the chance that subscriptions happen without your knowledge. Of course, if you later want a subscription to occur to a server that has not been designated, you have to edit the publication and allow that subscription server access.
Figure 19.15. Security options of Publications.
All subscribing servers that you earlier enabled are allowed to subscribe to any and all publications by default, unless you specifically change the default security.
The publication is created when you choose Add. You can edit the publication by highlighting it and choosing Change from the Manage Publications screen.
A publication needs to be subscribed to after it has been created. Subscriptions can be initiated in one of two ways.
In order to do a push subscription you need to have SA rights, not only on the publishing server, but on the subscribing server. This allows a single administrator with SA rights on both the publication and subscription servers to easily administer both from one location.
Push subscriptions are also necessary when the subscribing server is not a SQL Server, because non-SQL Servers don't have the necessary administration software to manage replication.
In order to initiate a pull subscription, select Subscribers from the Edit Publications screen (from within the publication).
Pull subscriptions usually occur when the subscribing server SA is different from the publishing server SA.
To initiate a pull subscription, choose Manage | Replication | Subscriptions. This opens a screen with all the publishing servers that have been defined for this subscribing server. Open the server, database, and publications to see all available articles (see Figure 19.16).
Figure 19.16. Viewing publications.
Simply highlight the publication or article and choose Subscribe to subscribe to a publication. You are prompted for the subscribing database and a synchronization method (see Figure 19.17).
Figure 19.17. Subscribing to a publication.
The initial synchronization method must be chosen when you subscribe to a publication. The default is for SQL Server to initially synchronize the tables, after which replication occurs based on the transaction log.
As an administrator, you are responsible for the initial synchronization of the tables if you change the default synchronization method. If you select the Manual method of synchronization, you must synchronize the tables yourself, and later start the replication process. If you select No Synchronization, the replication process assumes that the tables are in perfect synchronization, and starts immediately. Manual and No Synchronization are used when you have a backup of the table involved, and can load the backup faster than SQL Server can do a synchronization across the network. For example, if you were replicating a 10GB database, the initial synchronization, if performed by SQL Server, may take several hours across the WAN. If you had received a database backup, you may be able to load the data manually in considerably less time. Manual and No Synchronization are advanced methods for configuring replication, as the data cannot change between the time the backup is made and then loaded on the subscription server, or the databases are no longer in sync. These advanced replication methods should only be used if your replication scenario requires them.
After a subscription has been established, you can check to see if the two databases are in synchronization by highlighting an article and choosing More Info from the Manage Subscription Info screen (see Figure 19.18).
Figure 19.18. Viewing information about subscriptions.
You can view the publication, distribution, and subscription topology from the Topology screen (see Figure 19.19) after replication is installed.
Figure 19.19. Replication topology.
To access the Topology screen, go to Server | Replication Configuration | Topology. From the Topology screen, you can manage publication and subscription configurations, as well as create and manage articles and publications.
The + and - magnifying glass icons can be used to zoom in and zoom out, respectively.
There are various resources that can help when troubleshooting replication.
If replication is working incorrectly, the first question to ask is, "Has it ever worked?" If replication doesn't work at first, it is often a security issue. If replication was once working, but has stopped, then the replication tasks can be examined in order to see why they have stopped working.
The replication process is controlled by the distribution server, which reads and stores transactions that occur on the publishing server, and then reapplies those transactions on the subscription server(s). Because the distribution server is making connections to the publishing and then to the subscription server(s), security can be an issue.
The SQL Executive service is in charge of all replication tasks, and must be able to connect to all the servers involved in replication. This means that the user account assigned to the SQL Executive service must have access to each and every server involved in replication. If replication doesn't work initially, it may be because different users and passwords have been configured for the different SQL Servers, thus making connections between servers impossible.
The main thing to remember is that the SQL Executive needs SA permissions on all of the servers involved in replication. Because the SQL Executive has to utilize a user account to access servers other than itself, that means the user account that SQL Executive is assigned must have SA permissions on all servers involved in replication.
There are several ways to check whether the user account that SQL Executive has been assigned has sufficient rights on other SQL Servers. One way to test the account is to connect to the various servers using that name and password. If a connection cannot be made, you likely encounter errors when SQL Executive also attempts to connect to that server. Simply change the user account or password, so that all servers can be connected to using the same account.
The user account and password can be changed utilizing User Manager, and the account and password can be assigned to SQL Executive via the Control Panel | Services applet.
TIP: Because security issues often involve the user account assigned to SQL Executive, a good knowledge of Windows NT and user accounts can help you troubleshoot other types of security issues that may arise.
Another problem that can arise is when integrated security is being used, and the default NT Administrators group association with the SA account has been broken via SQL Security Manager. You have to add the user account that SQL Executive is using to a group that has SA permissions on the SQL Servers.
Several tasks are created on the distribution server when replication is installed. There are three types of replication tasks that are created when replication is installed.
Because replication tasks are similar to others type of tasks, the history, including time of the last task, and success or failure can be examined. To look at replication tasks, connect to the distribution server and go to Server | Manage Tasks. A list of configured tasks is shown, as you can see in Figure 19.20.
Figure 19.20. Replication tasks.
To check the replication tasks simply highlight the replication task you want to examine and select the History icon. You can examine a failed task to determine why the task failed (see Figure 19.21).
Figure 19.21. Examining failed replication tasks.
The History screen shows you the last 100 times the task was run (see Figure 19.22).
Figure 19.22. Examining the history of replication tasks.
There are various reasons replication tasks may fail. Some common reasons include:
Often, reading the error log for either the tasks involved, or for SQL Server gives you a good idea as to what the replication problem, and how to fix it.
Replication allows data and changes from one database to be automatically be sent to another database. Publishing servers contain the original database, distribution servers manage initial synchronization, as well as ongoing transactions, and subscription servers receive the data.
Articles are the base unit of replication; they are based on tables. Vertical partitioning allows you to specify certain columns for replication, and horizontal partitioning allows only certain rows to be replicated.
There are four basic models for replication, the simplest being the single publisher to single subscriber model. The single publisher to multiple subscriber model allows one database to update many databases.
Multiple publishers to one subscriber can be enabled, as long as a site-specific column is designated. Multiple publishers to multiple subscribers is the hardest model to implement, because custom scripts must be used to ensure that the data is not sent more than once.
Replication topology can be viewed, and publishing and subscription configuration can be done from the Topology window.
Follow this exercise by installing and configuring publishing, and set up the pubs database to replicate to a database called Copy_pubs.
© Copyright, Macmillan Computer Publishing. All rights reserved.