3. How to Use dataCompJ#
This section provides a step by step guide to operate dataCompJ and set up configuration file.
This chapter consists of the following sections:
- How to run dataCompJ
- Executing dataCompJ
- Configuration File Setup
How to run dataCompJ#
The followings are the command to execute dataCompJ in the Command Line Interface (CLI).
-
Linux
$ dataCompJCli.sh -f dataCompJ_env_file_path
-
Windows
C:\dataCompJ> dataCompJ.bat -f dataCompJ_env_file_path
dataCompJ_env_file_path#
This indicates the file path of dataCompJ configuration file and in a required option. dataCompJ.xml is provided as an configuration file when dataCompJ is installed but user can choose to create another file to use an as configuration file.
Executing dataCompJ#
dataCompJ operates based on the configuration file set up by the user, and it is largely divided into two phases which are build and run.
Build Phase#
Build phase is an initial investigation step to determine whether the run phase can be performed based on the configuration file. If any issue is discovered, it is output in the report file(dataCompJ_report.txt) and dataCompJ is terminated.
- Read the configuration file set up by the user.
- Verify whether the connection information described in the configuration file is valid.
- Connect to both databases to validate the target tables described in the configuration file and verify the meta information. If an issue is detected in either of the tables, it is recorded in the report file and dataCompJ is terminated.
Run Phase#
In run phase, data in the target tables are compared then either comparing(DIFF) or synchronizing(SYNC) is executed according to the user's choice.
The execution result of each TablePair is output in the report file(dataCompJ_report.txt).
Output Files#
When dataCompJ is executed, one report file and two log files are generated.
The output files generated by the comparison function(DIFF) is thoroughly delineated in the section 'Comparison (DIFF) Function'.
- dataCompJ_report.txt: This is a report file in a text form to inform the execution results to a user.
- dataCompJ.log: This is a log file recording events that occurs during the program operation. Also, this file is used to trace execution history of the program.
- dataCompJ_data.log: This is a log file generated in the run phase, and this file records details of inconsistent records processed during the execution of comparison(Diff)/synchronization(Sync) in case <TraceInconsistentRecord> in the configuration file is set to true. It is suggested to use this file only when report of inconsistent records is required in detail since large capacity file is created if there are a lot of inconsistent data and this degrades the program performance.
Configuration File Setup#
Configuration file is essential in order to execute dataCompJ. dataCompJ.xml file is provided by default when dataCompJ is installed, and it is also available to use any other file user has created as an configuration file. The configuration file should be written based on the XML rules delineated in dataCompJ.xml, and it should be encoded with UTF-8 if multiple languages are included.
The dataCompJ configuration file can be divided into three sections, such as Connections, Options, and TablePairs.
Connections#
The Connections section is the part in which information required for connecting to the Master DB and Slave DB is recorded.
<MasterDB>#
The connection information of Master DB is recorded. The Master DB should be Altibase database. The following XML elements are the sub-elements of the Master DB.
<JdbcUrl>
This is a character string used to record information of JDBC connection except user ID and password.
<JdbcFilePath>
This is used to specify the path in which the JDBC jar file exist in order to connect to Master DB.
<UserId>
This is used to specify the user ID in order to connect to the database.
<Password>
This is used to specify the user password in order to connect to the database.
<FetchSize>
This is used to specify the number of records that are fetched at once when importing data from the database. It is optional to fill this in and the default value is set to 1,000.
<BatchSize>
This is used to specify the number of records that are executed at once when updating the database by using INSERT/DELETE/UPDATE statement. For instance, if 10 is specified in this entry, ten INSERT/DELETE/UPDATE are executed at once. It is optional to fill this in and the default value is set to 1,000.
<SlaveDB>#
This is used to record the connection information of Slave DB. The sub-elements of Slave DB is identical to that of the Master DB.
Options#
User inserts property values to execute dataCompJ in this section.
<Operation>#
This is used to specify which function should be executed between comparison(DIFF) and synchronization(SYNC) function to process inconsistent data.
<FileEncoding>#
This is used to specify the type of encoding for files generated when dataCompJ is executed.
<Diff>#
<DirPath>
This is used to specify the directory path in which a CSV file is created as a result of comparing each target table when the comparison(DIFF) is executed.
<Sync>#
The followings are options for executing the synchronization(SYNC) function.
<MOSO UPDATE_TO_SLAVE="true"/>
This option specifies whether or not to update records in the slave table based on records in the master table if MOSO inconsistent data is detected. If it is set to 'false', the MOSO inconsistent data will not be processed.
<MOSX INSERT_TO_SLAVE="true"/>
This option specifies whether or not to insert the records which exist only in the master table into the slave table if MOSX inconsistent data is detected. If it is set to 'false', the MOSO inconsistent data will not be processed.
<MXSO DELETE_FROM_SLAVE="true"/>
This option specifies whether or not to delete the records in slave table which does not exist in the master table if MXSO inconsistent is detected. If it is set to 'false', the MOSO inconsistent data will not be processed.
<Log>#
The following XML elements are the sub-elements of a log.
<DirPath>
This element specifies the directory path of log files created when dataCompJ is executed.
<TraceInconsistentRecord>
This element specifies whether or not to record details of all the inconsistent records detected while DIFF/SYNC function is executed.
<MaxThread>#
This element indicates the maximum number of allocable threads. If it is specified to 0, the number of CPU core of the machine dataCompJ is running is allocated as MaxThread.
TablePairs#
TablePairs is the part in which target tables for comparison are recorded. There are two methods for recording the comparison target tables, such as recording each individual information on the target tables and specifying the path of the text file in which all the table names are listed. Such methods can be used either one at a time or simultaneously.
The methods for providing information on each individual table is as follows, and it has an advantage that it can control the data comparison method precisely. For instance, it is also possible to exclude specific columns or compare only target data satisfying certain conditions.
The target table's name should be one that is allowed by both Master DB and Slave DB. The user must use double quotation marks (") in XML file if the table name contains any space, special character or it is case-sensitive. For instance, when a comparision target table's name is Employee 01 of SYS schema, user must write the table name with double quotation marks as SYS."Employee 01" because it contains a space in its name.
<TablePair>#
This is a unit of target data for comparison comprised of one master table and one slave table. The following XML elements are the sub-elements of TablePair.
<MasterTable>
This element is the name of target data and it can be specified as [SchemaName].TableName format. Unless it is specified, the UserId of Master DB is the default schema name. This is a required entry; thus, an error is occurred when it is not written.
<SlaveTable>
This element is the name of target data and it can be specified as [SchemaName].TableName format. Unless it is specified, the UserId of Slave DB is used as the default schema name. This is a required entry; thus, an error is occurred when it is not written.
<Exclude>
This element specifies a condition for projecting table records. This element can also specify multiple columns by using a comma(,). This is an optional entry and if this is not written, all the columns with data types supported by dataCompJ would be selected as comparison targets.
<Where>
This elements specifies a condition for selecting table records. It has the same formay as WHERE clause of SQL statement, and multiple conditions are allowed. This is an optional entry, and all the records would be comparison targets if it is not written.
<TableNameFilePath>
Providing the path of the text file enumerating all the table names is as below, and this gives an advantage in inputting tables easily in case it is required to compare multiple tables at once.
<TableNameFilePath>table_name_file_path</TableNameFilePath>
table_name_file_path is the path of the text file enumerating table names. The name of comparison target can be specified with [SchemaName].TableName format, and each table is distinguished with a new-line character. The name of comparison target in Master DB should be identical to the name of comparison target in Slave DB.
Restrictions#
The following restrictions should be considered when selecting comparison target tables. If any constraint is infringed, dataCompJ outputs the issue occurred during the build phase to the report file(dataCompJ_report.txt) and does not proceed to the run phase.
-
A comparison target table should be identically(the column name, column order, data type, and primary key) composed. However, the data type should be compatible to that of the other database.
-
The unsupported data type is automatically excluded from the comparison targets. (e.g., binary type such as LOB)
-
There should be at least more than one column which can compare values other than a primary key.
- Example 1: table1 (c1 int, c2 int, c3 CLOB, primary key (c1, c2))
- Example 2: table1 (c1 int, c2 int, c3 varchar(100), primary key (c1, c2))
In the example 1, c3 is the only column satisfying the 'Constraint 3'. However, it infringes 'Constraint 2' since the data type CLOB is not supported by dataCompJ. Therefore, comparison for table 1 is not allowed.
In the next example, c3 is the only column satisfying 'Constraint 3' and it is varchar type which is supported by dataCompJ. Thus, the table 1 can be compared.