I had written the below paper as a solution to a problem faced during SQL 2000 issue analysis at Microsoft. This white paper will help provide real time data availability to the users by skipping the large transaction which is blocking subsequent replication transactions and thus adding to the data latency. Below is a part of the paper, detailed solution is posted in the Microsoft internal website.
Replication is a set of technologies for copying and distributing data and database objects from one database to another and then synchronizing between databases for consistency.
This white paper discusses about the Transactional Replication. Merge and Snapshot replication are not in scope of this white paper.
With transactional replication, an initial snapshot of data is applied at Subscribers, and then when data modifications are made at the Publisher, the individual transactions are captured and propagated to Subscribers. A minimum of three server roles are required for transactional replication:
- Publisher, hosting the publication database
- Distributor, hosting the distribution database
- Subscriber, hosting the subscription database
CRMPublish Case Study:
CRMPublish Overview
CRMPublish is a subscriber (replicated database) of the CRM Ecosystem and is a data source for ~120 downstream applications. There are almost 162 objects which are being replicated from the CRM Ecosystem into the CRMPublish DB.
Large Number of Commands in a Single Transaction
In a transactional replication, it is expected to have near real time data availability in the Subscriber database. If there is a delay in replicating the data on the subscriber, we call it as replication latency. There are multiple causes for latency and one of them is due to the large number of commands in a single transaction.
We experienced a similar issue in our application CRM Publish. Following were the observations during the incident:
- The replication slowed down on CRM Publish increasing the latency.
- A single transaction with 6.2 Million commands resulted in the issue.
- The transaction was introduced into the publisher through a bulk upload.
- Adding to the above transaction, the replication from Distributor to Subscriber further slowed down due to the subsequent transactions flowing from the publisher and getting piled up on the distributor.
If the commands are more than a million and is followed by subsequent large number of commands pending to be replicated and is adding to huge latency, you may opt for the below to avoid latency.
One of the ways to surmount this problem is:
Skipping a LSN – which means skipping the transaction having large number of commands and later applying the skipped data contained in the LSN manually over the Subscriber. This is applicable to one way transactional replication
Skipping a LSN (XACT_SEQNO) or a Transaction in a replication environment
This section presents the steps and code required to skip a single transaction. STEP 1 is specific for the transaction with large number of commands but in all cases involving a transaction skip, first we need to identify the transaction which has to be skipped (STEP 1) and then check if the transaction is currently being replicated (STEP 2).
This paper talks about a single transaction having large number of commands. There could be multiple such transactions having large number of commands and it does not describe about this scenario.
STEP 1: Identify the LSN which has large number of commands
STEP 2: Check if the above noted LSN (XACT_SEQNO) is the same as being currently replicated in Subscriber
STEP 3: To find article(s) involved in the large single transaction under replication
STEP 4: To skip the transaction from replication
a) Skip the transaction
b) Skip a table from the transaction
STEP 5: Apply the Data on the Subscriber database
STEP 6: Restart the Distribution Agent job
STEP 7: To confirm if the transaction has been skipped
Conclusion
This whitepaper shows how a replication transaction with huge number of commands can be skipped from being replicated.
We may decide to skip the transaction because of the following reasons:
1. The transaction takes a long time to commit increasing the replication latency.
2. Other transactions are waiting in the queue for the completion of this batch of transaction. This increases the load on the server further slowing down the server performance.
3. If the transaction rolls back before committing the complete batch of commands, it will restart the process of replication. This will lead to phenomenal increase in the data latency.
No comments:
Post a Comment