2. Managing Replication#

This chapter explains the replication steps and how to use Altibase replication functions for various faults and errors that can occur while performing replication.

Replication Procedures#

Choose the replication target servers
The database character sets and the national character sets of both servers must be the same.
Choose the tables or partitions to be replicated
Every table to be replicated must have a primary key.
Create a replication object using the CREATE REPLICATION statement
The replication object must have the same name in both databases.
Start replication using the ALTER REPLICATION statement

When replication is started, the local server creates a replication Sender thread and this thread connects to a replication manager on the remote server. At this time, the replication manager on the remote server generates a repliation Receiver thread.
The replication service is started.

Troubleshooting#

Abnormal local or remote server shutdown
Communication interruption between local and remote servers
Service line failure

Abnormal Local or Remote Server Shutdown#

[Figure 2-2] Replication in the Event of Server Failure

Server A abnormally terminates#

The Receiver thread on Server B terminates and the Sender thread on Server B attempts to connect to Server A at regular intervals (e.g., every 60 seconds).

Server A restarts (the Sender thread calls the Receiver thread on the remote server)#

Server A's Sender thread automatically starts and performs replication with Server B.
Server B's Sender thread starts Server A's replication Receiver thread and it performs replication.
Server B's Sender thread starts Server A's Receiver thread.
Server A's Sender thread starts Server B's Receiver and it performs replication.

Communication Interruption Between Local and Remote Servers#

[Figure 2-3] Replication in Response to Communication Failure with Remote Server

The local and remote servers fail to communicate0#

The Receiver threads on Server A and B roll back and terminate uncommitted transactions
The Sender threads on Server A and B record the Restart SN and attempt to connect to the corresponding servers every 60 seconds.

Connection is restored#

The Sender threads on Server A and B wake up the receiver threads on the corresponding servers and perform replication by transmitting all XLOGs, starting with the XLOG corresponding to the Redo Log having the Restart SN.
Receiver threads on Server A and B are created in response to connection requests from the Sender threads on corresponding servers, and perform replication.

Service Line Failure#

Primary line is disconnected#

Server B provides service with a backup line.

Primary Line is restored#

Once the primary line is restored, Server A provides service.
Even if the primary line is down, Server B can still send task contents to Server A with the Altibase replication functionality.

Conflict Resolution#

Resolution#

A data conflict occurs when the master transaction makes data changes, but the replication transaction cannot apply the changes due to duplicate primary keys or constraints.

For Deferred Replication, the best way to avoid data conflicts is to have different update data sets per database.

There are three types of data conflicts:

INSERT Conflicts#

An INSERT conflict occurs if the replication transaction tries to insert data that has the same primary key as an existing record.
If the replication transaction tries to insert data into a table that is already locked by another local transaction, the replication transaction needs to wait to acquire a lock. An INSERT conflict occurs due to lock timeout.
An INSERT conflict occurs if the replication transaction tries to insert a duplicate value into a primary key column.

UPDATE Conflicts#

An UPDATE conflict occurs if the replication transaction tries to update a record with a nonexistent primary key.
An UPDATE conflict occurs if the replication transaction tries to update a record whose data is different from the record's before image (i.e. data prior to changes ) updated by the master transaction.
An UPDATE conflict occurs if a duplicate key value is created by an update operation.

DELETE Conflicts#

A DELETE conflict occurs if the replication transaction tries to delete a record that has a nonexistent primary key.
If the replication transaction tries to delete a record that is already locked by the local transaction, the replication transaction needs to wait to acquire a lock. A DELETE conflict occurs due to lock timeout.

Unlike distributed DBMSs that use the 2-Phase Commit (2-PC) or 3-Phase Commit (3-PC) protocols, the replication functionality cannot guarantee data consistency against conflicts for commercial DBMSs. Contrariwise, 2-PC/3-PC have performance degradation issues and additional measures are required for system or network failure.

As a result, commercial DBMSs mainly use deferred (asynchronous) replication to alleviate the constraints on data consistency and maintain solid performance.

Conflict Resolution#

Conflict resolution refers to a variety of methods for eliminating data conflicts. Deferred Replication does not offer a perfect solution to data conflicts. Once a conflict occurs, it is merely resolved by synchronizing the data on the database servers. Altibase provides the following conflict resolution methods to resolve data conflicts:

User-Oriented Scheme
Master-Slave Scheme
Timestamp-Based Scheme

Altibase performs the following operations for the above methods:

Synchronizes a server's data with another server.
Logs information about conflicts for issue tracking.

However, LOB columns are excluded from conflict resolution. LOB columns cannot detect data conflict because they neither log before images nor define primary or unique keys.

The policies for each conflict situations are provided in detail below.

Note: For more detailed description of the CREATE REPLICATION command, please refer to the description of the CREATE REPLICATION statement.

User-Oriented Scheme#

Syntax#

CREATE REPLICATION replication_name
WITH 'remote_host_ip', remote_host_port_no 
FROM user_name.table_name TO user_name.table_name
FROM user_name.table_name TO user_name.table_name,
… 
FROM user_name.table_name TO user_name.table_name;

Description#

INSERT Conflict#

If an INSERT conflict occurs, the INSERT statement fails and a conflict error message is output to altibase_rp.log.

Use the REPLICATION_INSERT_REPLACE property to set the conflict resolution policy for an INSERT conflict that arises due to trying to insert data having the same primary key as an existing record.

REPLICATION_INSERT_REPLACE=1: Delete and insert
REPLICATION_INSERT_REPLACE=0: Either does not delete or insert; outputs an error message

UPDATE Conflict#

If an UPDATE conflict occurs, the UPDATE statement fails and a conflict error message is output to altibase_rp.log

Use the REPLICATION_UPDATE_REPLACE property to set the conflict resoultion policy for an UPDATE conflict that arises due to trying to update data with a different before image or update data with a nonexistent primary key. For example, the following policies can be used when there is a data of the value 10 and the replication transaction tries to update that value from 20 to 30.

REPLICATION_UPDATE_REPLACE=1 : Update
REPLICATION_UPDATE_REPLACE=0 : Does not update; outputs a conflict error message

DELETE Conflict#

If a DELETE conflicts occurs, the DELETE statement fails and a conflict error message is written to altibase_rp.log.

Summary#

The user determine the conflict resolution policy on a case-by-case basis.
The altiComp utility is provided as a solution for with data inconsistency. For more detailed information, please refer to the Utilities Manual > altiComp.

Master-Slave Scheme#

Syntax#

CREATE REPLICATION replication_name {AS MASTER|AS SLAVE}
WITH 'remote_host_ip', remote_host_ port_no
FROM user_name.table_name TO user_name.table_name,
FROM user_name.table_name TO user_name.table_name,
… 
FROM user_name.table_name TO user_name.table_name;

Description#

Specify AS MASTER or AS SLAVE in the command to specify whether the server is the Master or Slave. On omission, the value specified by the REPLICATION_INSERT_REPLACE or REPLICATION_UPDATE_REPLACE property is used.
The user can check whether a server is the Master or Slave from the CONFLICT_RESOLUTION column in the SYS_REPLICATIONS_ meta table.
- 0 = not specified
- 1 = Master
- 2 = Slave
The handshake¹ is only successful if the CONFLICT_RESOLUTION column has the following values: 0 with 0, 1 with 2, and 2 with 1. Any other combinations will fail. If one server is specified as the Master but the other server is omitted , the following error will be output when replication starts:

iSQL> ALTER REPLICATION rep1 START;
[ERR-6100D : [Sender] Failed to handshake with the peer server (Master/Slave conflict resolution not allowed [1:0])]

Master/Slave Replication Conflict Handling Method#

Operating as Master#

INSERT conflict: Not committed.
UPDATE conflict: Not committed.
DELETE conflict: Not committed.
Other: XLOG transferred from the Slave is processed as usual.

Operating as Slave#

INSERT conflict

If an insert conflict occurs because an attempt was made to insert data having the same primary key as an existing record, the existing record is deleted and a new record is added. If an insert conflict occurs for any other reason, the INSERT statement fails, and a conflict error message is recorded in altibase_rp.log.
UPDATE conflict

If an update conflict occurs because an attempt was made to update a record having a value different from the "Before Image" value on another database server, from which data for replication are propagated, the conflict is ignored, and the UPDATE statement succeeds despite the conflict. If an update conflict occurs for any other reason, the UPDATE statement fails, and a conflict error message is recorded in altibase_rp.log.
DELETE conflict

If a delete conflict occurs because no record having that primary key exists, the DELETE statement fails, and a conflict error message is not recorded in altibase_rp.log. If a delete conflict occurs for any other reason, the DELETE statement fails, and a conflict error message is recorded in altibase_rp.log.
Other

The XLOG transferred from the Master is processed as usual.

Example#

Suppose that the IP address and replication port number of the local server are 192.168.1.60 and 25524, and that the IP address and replication port number of the remote server are 192.168.1.12 and 35524, that there is a master-slave relationship between the local and remote servers, and that a table called employees and one called departments are replication target tables. In this situation, replication is specified as follows:

Local Server (IP: 192.168.1.60)#

iSQL> CREATE REPLICATION rep1 AS MASTER
    2 WITH '192.168.1.12', 35524
    3 FROM SYS.employees TO SYS.employees,
    4 FROM SYS.departments TO SYS.departments;
Create success.

Remote Server (IP: 192.168.1.12)#

iSQL> CREATE REPLICATION rep1 AS SLAVE
    2 WITH '192.168.1.60', 25524
    3 FROM SYS.employees TO SYS.employees,
    4 FROM SYS.departments TO SYS.departments;
Create success.

Whether a server is a Master or Slave can be determined by checking the CONFLICT_RESOLUTION field, which is located in the SYS_REPLICATIONS_ meta table. (0 = not specified; 1 = Master; 2 = Slave)

iSQL> SELECT REPLICATION_NAME, CONFLICT_RESOLUTION FROM SYSTEM_.SYS_REPLICATIONS_;
REPLICATION_NAME                CONFLICT_RESOLUTION 
-------------------------------------------------------
REP1                            1
1 row selected.

Timestamp-based Scheme#

Syntax#

CREATE REPLICATION replication_name
WITH 'remote_host_ip', remote_host_port_no 
FROM user_name.table_name TO user_name.table_name,
FROM user_name.table_name TO user_name.table_name,
… 
FROM user_name.table_name TO user_name.table_name;

Description#

The Timestamp-Based Scheme is provided to ensure that both servers have the same data in an Active-Active replication environment.

The following restrictions apply when using the Timestamp-Based Scheme:

Every table must contain a TIMESTAMP column.
The REPLICATION_TIMESTAMP_RESOLUTION property must be set to 1.

Because Altibase supports the Timestamp-Based Scheme on the basis of tables, even if a replication target table has a TIMESTAMP column, if the value of the REPLICATION_TIMESTAMP_RESOLUTION property for that table has been set to 0, a conventional conflict resolution scheme will be used.

Supposing for example that a user wishes to replicate a table called "foo" and another called "bar" between two servers, if the REPLICATION_TIMESTAMP_RESOLUTION property is set to 1 for the "foo" table, the Timestamp-Based Scheme will be used for that table, whereas a conventional conflict resolution scheme will be used for the "bar" table.

CREATE TABLE foo (a DOUBLE PRIMARY KEY, b TIMESTAMP);
CREATE TABLE bar (a DOUBLE PRIMARY KEY, b CHAR(3));
CREATE REPLICATION rep WITH '127.0.0.1', 30300 
  FROM SYS.foo TO SYS.foo, FROM SYS.bar TO SYS.bar;

Timestamp-based Replication Processing Method#

Altibase supports the Timestamp-Based Scheme only for INSERT and UPDATE operations.

INSERT#

If data to be inserted have the same key as existing data, the timestamp value of the After-Image of the data is compared with that of the existing data.
If the TIMESTAMP value of the After-Image of the data is equal to or greater (i.e. more recent) than that of the existing data, the existing data are deleted, and new data, having the value of the After-Image of the data, are added.

UPDATE#

The TIMESTAMP value of the After-Image of the data is compared with that of the data to be updated.
If the TIMESTAMP value of the After-Image of the data is equal to or greater (more recent) than that of the existing data, the data are updated with the After-Image of the data
When UPDATE is performed, the TIMESTAMP value in the After-Image of the data is kept. In other words, independent system time values are not used.

Restrictions#

When a TIMESTAMP column is added to a table, 8 additional bytes of storage space are needed per record.
If the time is set differently on the two servers to be replicated, database inconsistencies can result.

Eager Replication Failback#

This section examines how the data of both servers are synchronized and then failback when a node fails in the replication environment of eager mode.

To configure a replication environment in eager mode, replication of both servers in eager mode must be created and started. In addition, in order to failback in a replication environment of the eager mode, replication must be operated in the eager mode even in the failback environment (recovery).

Incremental Sync#

In the replication environment in Eager mode, a failure (equipment issue or Altibase server failure) may occur on a Server A without a commit log written on one server (Server A) and the commit log is not sent to the other server (Server B). In this case, Server A returns an error to the application that executed the transaction commit, and Server B will roll back the transaction because it does not receive the commit log. Then, application will fail over and continue working on Server B. AS a result, there may be a situation in which data in the same record does not match on Server A and Server B.

Since this data affects the replication operation in the future, a failback process for inconsistent data of both nodes is required. This is called 'incremental sync', a data synchronization operation for removing inconsistencies caused by replication node failures of EAGER mode. Targets of this operation are records with any possibility of committed on only one node. Incremental Sync analyzes data that can be different from the other node, requests and synchronizes those missing transaction logs.

When Server A comes back online, it is determined as the value of the REMOTE_FAULT_DETECT_TIME column in the SYS_REPLICATIONS meta tables of both servers (when the failure of the other nodes is detected), to become master or slave respectively. That is, the server with later value becomes the master. In this case, Server B will be the master server. Once the master and the slave are determined, the replication sender of the slave (Server A) analyzes its transaction log from the restart SN (sequence number) and determines the data that have not been sent to the other server (i.e., may be different from the master), and import the data from the master and perform the synchronization. The master's replication sender sends the data requested by the slave. That is, the replication senders of both servers operate as master and slave.

Thus, in order to increment synchronization to complete successfully, neither of the replications of both servers should be stopped.

Then, replication sender of Server A updates the restart SN (sequence number) to the latest. This is to prevent the replication sender of Server A from retransmitting the data that are already syncrhonized in the above process.

Incremental Sync can be enabled or disabled by adjusting the REPLICATION_FAILBACK_INCREMENTAL_SYSN property value, which must be set to the same value on both nodes' server.

Normal Sync#

After the server completes or skips the incremental sync, before the Eager mode replication starts normally, it synchronizes the data from the transaction log that failed to transfer from Server B to Server A during the failure. This synchronization is called normal sync. When transmitting a log that could not be transmitted could be transmitted due to a failure, replication switches to the Lazy mode, and when all the logs are transmitted and there is no replication gap, replication starts by switching back to the Eager mode.

When the REPLICATION_FAILBACK_INCREMENTAL_SYNC property value is set to 1, normal synchronization will be performed after the above incremental sync. Otherwise, incremental sync will be skipped and normal synchronization will be performed immediately.

Parallel Replication#

Parallel replication indicates using multiple sending and receiving threads when executing the replication.

Altibase supports the following parallel replication based upon the replication mode:

Lazy mode: This is one of the additional features which allows receiver parallel replication.
Eager mode: This allows sending parallel replication by controlling properties

In the eager mode, the parallel replication is used by controlling REPLICATION_EAGER_PARALLEL_FACTOR property in order to manage multiple sending threads.

In case of replication in the eager mode, it is available to replicate a transaction unit since commit execution is possible if nodes in both sides are read. Altibase materializes the parallel replication with multiple replication sender threads so that each thread can process one transaction. Also, this type of parallel replication offers much enhanced replication performance with rapid speed that that of the conventional synchronous replication.

The parallel replication method in lazy mode is described in the Extra Features section.

Note#

If using the parallel replication, there might be an impossible situation to commit due to reversed transaction sequence in the process of replicating. Thus, the application program should implement logic in order to overtly execute a rollback if a commit failure is returned when using the parallel replication.

The following performance views are provided to monitor replication progress. For more information on performance views, see the Altibase Data Dictionary.

V$REPEXEC
V$REPGAP
V$REPGAP_PARALLEL
V$REPLOGBUFFER
V$REPOFFLINE_STATUS
V$REPRECEIVER
V$REPRECEIVER_COLUMN
V$REPRECEIVER_PARALLEL
V$REPRECEIVER_STATISTICS
V$REPRECEIVER_TRANSTBL
V$REPRECEIVER_TRANSTBL_PARALLEL
V$REPRECOVERY
V$REPSENDER
V$REPSENDER_PARALLEL
V$REPSENDER_STATISTICS
V$REPSENDER_TRANSTBL
V$REPSENDER_TRANSTBL_PARALLEL
V$REPSYNC

Handshaking is the process of checking whether the other server is alive and whether the information about the objects to be replicated between the local server and the remote server matches before replication starts. ↩

2. Managing Replication#

Replication Procedures#

Troubleshooting#

Abnormal Local or Remote Server Shutdown#

Server A abnormally terminates#

Server A restarts (the Sender thread calls the Receiver thread on the remote server)#

Communication Interruption Between Local and Remote Servers#

The local and remote servers fail to communicate0#

Connection is restored#

Service Line Failure#

Primary line is disconnected#

Primary Line is restored#

Conflict Resolution#

Resolution#

INSERT Conflicts#

UPDATE Conflicts#

DELETE Conflicts#

Conflict Resolution#

User-Oriented Scheme#

Syntax#

Description#

INSERT Conflict#

UPDATE Conflict#

DELETE Conflict#

Summary#

Master-Slave Scheme#

Syntax#

Description#

Master/Slave Replication Conflict Handling Method#

Operating as Master#

Operating as Slave#

Example#

Local Server (IP: 192.168.1.60)#

Remote Server (IP: 192.168.1.12)#

Timestamp-based Scheme#

Syntax#

Description#

Timestamp-based Replication Processing Method#

INSERT#

UPDATE#

Restrictions#

Eager Replication Failback#

Incremental Sync#

Normal Sync#

Parallel Replication#

Note#

Performance View related to Replication#