7. Fail-Over#
Altibase provides a Fail-Over function to overcome failures and provide service regardless of failures while operating the database system. This chapter describes the functions and usage of Fail-Over supported by Altibase.
About Fail-Over#
Fail-Over means overcoming a failure during DBMS operation and allowing the service to continue as if no failure occurred.
Possible faults include the failure of the DBMS server itself, the failure of the network path to the equipment, or the failure of the DBMS due to a software error. Fail-Over allows users to connect to another DBMS server in the event of a failure, regardless of the type of failure, thereby allowing the application to continue the service without being aware of the failure.
One of the following two kids of Fail-Over is performed, depending on the time point at which the fault is discovered:
-
CTF (Connection Time Fail-Over)
-
STF (Service Time Fail-Over)
With CTF, it recognizes the fault at the connection time of the DBMS and retries connection to the DBMS of the other available node instead of the failed DBMS.
With STF, it detects the fault during DBMS service and restores the properties of the session by reconnection to the DBMS of another available node, so that the business logic of the user application can be continuously executed.
Because STF only performs the Fail-Over for DB connection, the failed transaction must be reprocessed by the user.
In order to obtain correct operation results in such Fail-Over, database consistency between the fault DBMS and the available DBMS should be ensured. Altibase provides a database replication method using Off-line replicators to ensure database consistency. Offline replication is a method of matching the database by reading the log of the active server from the stand-by server.
Due to the characteristic of the replication method, database consistency may be inconsistent, so it is recommended to check the consistency using the Fail-Over callback function. Fail-Over callbacks are described in detail in the next section.
Altibase's Fail-Over setting is done by registering the Fail-Over attribute in the application program. The integrity of the database can be checked before executing Fail-Over using the Fail-Over callback function.
The three kinds of Fail-Over-related tasks that must be executed by the client application are summarized as follows:
-
Fail-Over connection property registration
-
Fail-Over callback function registration
-
Process business logic based on callback results.
For more detailed information, please refer to the Replication Manual.
How to Use Fail-Over#
Setting the Fail-Over Connection Property#
If the Fail-Over connection property has been set, Altibase will detect it when a failure occurs and perform Fail-Over tasks as specified by the connection property.
These are two ways to show property values:
-
By viewing the connection property string used by the API's "Connect" function
-
By viewing the Altibase properties files:
altibase_cli.ini file or odbc.ini file (WinODBC)
For more details about how to set this property, please refer to the Replication Manual.
Checking Whethere Fail-Over has Succeeded.#
In the case of CTF (Connection Time Fail-Over), attempting to connect to the database makes it immediately obvious whether Fail-Over was successful. In contrast, in the case of STF (Service Time Fail-Over), whether Fail-Over was successful is determined by checking for exceptions and errors.
For example, in the case of JDBC, when a SQLExeception is caught, the value of SQLStates.status is checked using the SQLException's getSQLState() method, and if this value is found to be ES_08FO01, then it is known that Fail-Over succeeded.
In the case of CLI and ODBC, if the result of a SQLPrepare, SQLExecute, or SQLFetch statement or the like is an error rather than SQL_SUCCESS, a statement handle is returned in response to SQLGetDiagRec, and if the result of the call to SQLGetDiagRec is ALTIBASE_FAILOVER_SUCCESS, then it is confirmed that STF (Service Time Fail-Over) succeeded.
When using embedded SQL, after executing an EXEC SQL statement, the value of the return code "sqlca.sqlcode" is checked, and if it is ALTIBASE_FAILOVER_SUCCESS (rather than SQL_SUCCESS), then it is confirmed that STF (Service Time Fail-Over) succeeded.
For more detailed information on these settings, please refer to the Replication Manual.
How to Write a Fail-Over Callback Function#
The way to write a Fail-Over function differs depending on the form of the client application, but the basic structure is usually the same and consists of the following:
- Defining Fail-Over related data structures
- Writing the body of Fail-Over Callback function that will be called when Fail-Over related events occur.
- Checking whether Fail-Over has succeeded.
At the step of defining Fail-Over-related data structures, the Fail-Over-related data structure is defined or the interface (header file) for it is included.
At the step of writing the body of Fail-Over Callback functions, necessary codes such as checking consistency are implemented. The codes will be executed in case the Fail-Over start event or finish event occurs.
At the step of checking whether Fail-Over has succeeded, it checks if Fail-Over is successfully completed and Fail-Over callback functions finish without any errors. If it is true, the application service can resume.
For more detailed information on how to write such functions in various client application environments, please refer to the Replication Manual.