Skip to content

3. Functions#

This chapter explains Altibase Hadoop Connector functions with examples.

Command-line Options#

Because the Altibase Hadoop connector is based on the sqoop, it supports all the fucntions provided by the sqoop. Since this chapter only describes some of the features, please refer to Sqoop User Guide (v1.4.4).

Syntax#

Listed below is an example of the Altibase Hadoop Connector syntax:

sqoop <command> --connect <url>
--driver <driver> 
--username <user> 
--password <password> 
--connection-manager    com.altibase.sqoop.manager.AltibaseManager

Options#

Options Description
<command> Since only the following commands are described in this document, refer to the sqoop document mentioned above for other commands.
export import list-databases list-tables
--connect <url> Specifies the Altibase URL:
jdbc:Altibase://*ip*:*port*/mydb
--driver <driver> Specifies the Altibase JDBC driver:
class:Altibase.jdbc.driver.AltibaseDriver
--username <user> Specifies the database user.
--password <password> Specifies the password.
--connection-manager <connector> Specifies the Altibase Hadoop Connector:
com.altibase.sqoop.manager.AltibaseManager

Additional commands, import and export options can be found at: Sqoop User Guide (v1.4.4).

Import#

Import is a function which imports the data of Altibase to HDFS or Hive. This section provides explanations of these functions with examples.

Importing to HDFS in a Text File#

The following options can be used with the Sqoop command to store data of a certain table existing in Altibase to a configured directory on HDFS in text file format.

% sqoop import 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
-split-by <split_name>

By default, text files are stored as CSV. Separators, such as field termination characters, line termination characters can be specified with the options listed in the following table. If the user does not specify one of the options, default values are taken.

Option Description Default Value
--enclosed-by The character which encloses the field; this option is required. \"
--escaped-by The escape character
--fields-terminated-by The field separator character ,
--lines-terminated-by The line termination character \n
--mysql-delimiters The MySQL default delimiter; the following are available:
fields-terminated-by: ,
lines-terminated-by: \n
escaped-by: \
optionally-enclosed-by: '
--optionally-enclosed-by The character which encloses the field; this option is optional.

Caution

If a line separator or a field separator is included in the field, it must be enclosed in punctuation marks, such as quotation marks("). If a quotation mark is included in the field, an escape character must be prefixed to the quotation mark to indicate that it is not a punctuation mark. Due to a sqoop bug found at the time of writing(sqoop operates abnormally when the punctuation mark and the escape character are duplicate), the Altibase Hadoop Connector only provides quotation marks as the default value for punctuation marks and offers support for the user to directly specify the escape character.

Importing to HDFS in a Sequence File#

The following options can be used with the Sqoop command to store data of a certain table existing in Altibase to a specified directory on HDFS in sequence file format.

% sqoop import 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
-split-by <split_name> 
--as-sequencefile

Importing to HDFS in an Avro File#

The following options can be used with the Sqoop command to store data of a certain table existing in Altibase to a specified directory on HDFS in Avro file format.

% sqoop import 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
-split-by <split_name> 
--as-avrodatafile

Importing to HDFS Using a Query#

The following options can be used with the Sqoop command to store data of a certain table existing in Altibase into HDFS using a user-specified query statement.

% sqoop import 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
-split-by <split_name> 
--boundary-query <query>

Importing to Hive#

The following options can be used with the Sqoop command to store data of a certain table existing in Altibase into Hive.

% sqoop import 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
-split-by <split_name> 
--hive-import

Export#

Export is a function which exports HDFS data to Altibase. Depending on the option, data is inserted into the database or existing data is updated.

Inserting Data#

The following options can be used with the Sqoop command to store HDFS data in a certain table existing in Altibase.

% sqoop export 
-D sqoop.export.records.per.statement=<size> 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
--export-dir <dir>

If the value set for sqoop.export.records.per.statement is larger than 1, the Altibase Hadoop Connector operates in batch mode. Batch mode inserts multiple records in one EXECUTE statement. This option must precede other options. If the value set for sqoop.export.records.per.statement is 1, it is possible that the Altibase Hadoop Connect will not operate in batch mode.

Inserting Data From a CSV File#

When exporting with a CSV file, options which specify delimiters are available for use. For more detailed information on options available for use, please refer to "Importing to HDFS in a Text File".

Inserting Data in Batch Mode#

The following options can be used with the Sqoop command to store HDFS data to a certain table existing in Altibase using the batch mode feature of the Altibase JDBC driver.

% sqoop export 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
--export-dir <dir> 
--batch

Regardless of whether or not the batch option has been specified, the Altibase Hadoop Connector runs in batch mode and 100 records are inserted by default for one Execute operation. To disable the Altibase Hadoop Connector from running in batch mode, the -D sqoop.export.records.per.statement option should be set to 1.

Updating Data#

The following options can be used with the Sqoop command to update data of a certain table existing in Altibase to HDFS data

% sqoop export 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
--export-dir <dir> 
-update-key <column>

Updating or Inserting Data#

This function performs an update if HDFS data exists in Altibase, and performs an insert if not. The following options can be used with the Sqoop command:.

% sqoop export 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
--export-dir <dir> 
-update-key <column> 
--update-mode allowinsert

Note: Since this function uses the MERGE statement of Altibase, it is only supported when the Altibase Hadoop Connecter is used with Altibase version 6.3.1 or above.

Inserting Data Using a Staging-Table#

Since Sqoop exports with multiple transactions, some data can fail to be committed. The -- staging-table option is used to prevent such failure and first inserts HDFS data to the table specified for this option, and then moves this data to a target table in Altibase. The following options can be used with the Sqoop command:

% sqoop export 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager 
-table <table_name> 
--export-dir <dir> 
-update-key <column> 
--staging-table <table_name>

list-databases#

The following options can be used with the Sqoop command to retrieve the database list of Altibase.

% sqoop list-databases 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager com.altibase.sqoop.manager.AltibaseManager

list-tables#

The following options can be used with the Sqoop command to retrieve the tables existing in Altibase.

% sqoop list-tables 
--connect <url> 
--driver <jdbc_driver> 
--username <user> 
--password <password> 
--connection-manager  com.altibase.sqoop.manager.AltibaseManager