Appendix A. Regular Expressions#
This section explains the regular expressions supported by Altibase.
Regular Expression Support#
Regular expressions are a syntax convention for writing text patterns and consist of one or more character strings and metacharacters. Altibase partly supports POSIX Basic Regular Expression (BRE) and Extended Regular Expression (ERE). Regular expressions supported by Altibase have the following limitations and features.
- Multibyte characters are unsupported.
- Backreferences ( \digit) are unsupported.
- Lookaheads ( ?=)and lookbehinds ( ?<=) are unsupported.
- Conditional regular expressions (e.g., condition)B|C) are unsupported.
- The escape character is supported.
The following table describes character classes.
| Character class | Shorthand | Description |
|---|---|---|
| [:alnum:] | Any alphanumeric character | |
| [:alpha:] | \a | Any alphabetic character |
| [:blank:] | The space or tab character | |
| [:cntrl:] | \c | Any non-printing ASCII control chracter (i.e. 127, 0~31) |
| [:digit:] | \d | Any numeric digit |
| [:graph:] | Any character equivalent to 'print', other than space | |
| [:lower:] | \l | Any alphabet in lowercase |
| [:print:] | Any printing ASCII character (i.e. 32~126) | |
| [:punct:] | \p | Any punctuation character among ASCII printing characters (32~126), other than a space, numeric digit and alphabetic character. |
| [:space:] | \s | Any non-printing space character (e.g., a space, carriage return, newline, vertical tab, form feed, etc.) |
| [:upper:] | \u | Any alphabetic character in uppercase |
| [:word:] | \w | The alphabetic character, numeric digit and underscore "_" |
| [:xdigit:] | \x | Any hexadecimal digit, 0-9, a-f, A-F |
| \A | Any character, other than \a | |
| \W | Any character, other than \w | |
| \S | Any character, other than \s | |
| \D | Any character, other than \d | |
| \X | Any character, other than \x | |
| \C | Any character, other than \c | |
| \P | Any character, other than \p | |
| \b | The word border | |
| \B | Any character, other than \b |
The following table describes metacharacters that can be used for regular expressions in Altibase, and their meanings.
|
Metacharacter |
Description |
|---|---|
|
. |
Matches a single character, other than the newline. The punctuation character(.) of a regular expression enclosed in square brackets matches the literal dot. For example, a.c matches "abc", but [a.c] matches only "a", ".", or "c". |
|
[] |
A character class expression. Matches a single character enclosed in square brackets. For example, [abc] matches "a", "b", or "c"; [a-z] matches any alphabetic character in lowercase, from "a" to "z". The format can also be mixed: both [a-cx-z] and [abcx-z] match "a", "b", "c", "x", "y", or "z". If the right square bracket (]) is the initial character to follow a circumflex (^), it can be included in the expression enclosed in square brackets: []abc]. ] If the circumflex (^) is the initial character enclosed in square brackets ([]), it matches any character other than those enclosed in the square brackets ([]). For example, [^abc]d matches "ed", "fd", but not "ad", "bd" and "cd". [^a-z] matches any character that does not start with an alphabetic character in lowercase. |
|
^ |
Matches the beginning character of a string. |
|
$ |
Matches the last character of a string or the preceding character of the last newline of a string. |
|
* |
Matches the preceding element for 0 or more times. For example, ab*c matches "ac", "abc", "abbbc", etc.; [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", etc.; (ab)* matches "", "ab", "abab", "ababab", etc. |
|
+ |
Matches the preceding character for 1 or more times. |
|
? |
Matches the preceding character for 0 or 1 time. |
|
{m,n} |
Matches the preceding element for a minimum of m, and a maximum of n times. For example a{3,5} matches "aaa", "aaaa", and "aaaaa". |
|
{m} |
Matches the preceding element for m times. |
|
{m,} |
Matches the preceding element for m or more times. |
|
| |
Matches a single expression among multiple expressions. |
|
() |
Matches a subexpression. Multiple expressions can be grouped as a single complex regular expression. |
Perl Compatible Regular Expressions (PCRE2) Library#
Perl Compatible Regular Expressions Library is supported from Altibase 7.1.0.7.7 and the Perl Compatible Regular Expressions Library version is 10.40. This library supports searching in Korean which is not supported by Altibase Regular Expression Library and new search features such as backreferences, lookaheads, etc. are added.
Perl Compatible Regular Expressions Library Limitations#
- This library is only supported when Altibase server character set is US7ASCII or UTF-8.
- There are syntax differences with the Altibase Regular Expression Library.
Syntax Differences between the two Regular Expression Libraries#
The table below shows the differences in syntax between the two regular expression libraries.
| Regular Expression Syntax | Example of Regular Expression Syntax of Altibase Regular Expression Library | Example of Regular Expression Syntax of Perl Compatible Regular Expressions Library |
|---|---|---|
|
POSIX character class |
SELECT REGEXP_COUNT('ABCDEFG1234567abcdefgh!@#$%^&*(','[:punct:]+');
SELECT REGEXP_COUNT('ABCDEFG1234567abcdefgh!@#$%^&*(','\l+');` |
`SELECT REGEXP_COUNT('ABCDEFG1234567abcdefgh!@#$%^&*(','[[:punct:+'); SELECT REGEXP_COUNT('ABCDEFG1234567abcdefgh!@#$%^&*(','[[:lower:+');` |
|
POSIX collating element or equivalence class |
`SELECT I1 FROM T1 WHERE REGEXP_LIKE( I1, '[=A=]' );` | Unsupported |
| `SELECT I1 FROM T1 WHERE REGEXP_LIKE( I1, '[A-[.CH.' );` | Unsupported | |
|
Backreference |
Unsupported | `SELECT * FROM T1 WHERE REGEXP_LIKE(I2,'(ALTI(BASE)7) DATA\2');
SELECT * FROM T1 WHERE REGEXP_LIKE(I2,'(ALTI(? |
|
Lookahead |
Unsupported | `SELECT * FROM T1 WHERE REGEXP_LIKE(I2,'ALTI.*(?=DATABASE)'); SELECT * FROM T1 WHERE REGEXP_LIKE(I2,'ALTI.*(?!DATABASE)');` |
|
Lookbehind |
Unsupported | `SELECT * FROM T1 WHERE REGEXP_LIKE(I2,'(?<=ALTIBASE7) DATABASE'); SELECT * FROM T1 WHERE REGEXP_LIKE(I2,'(? |
|
Conditional pattern |
Unsupported | `SELECT REGEXP_SUBSTR(I2,'(?(?=ALTIBASE)ALTIBASE7|DATABASE)') FROM T1;` |
|
Character with the xx property |
Unsupported | `SELECT REGEXP_SUBSTR(I2,'\P{HANGUL}+') FROM T1;` |
For more details about the regular expression syntax of the Perl Compatible Regular Expressions Library, please refer to the Perl Compatible Regular Expressions Library pattern manual page.
Altering the Regular Expression Library#
Since Altibase provides Altibase Regular Expression Library and Perl Compatible Regular Expressions Library, the library to be used for the regular expressions has to be chosen between the two. The default library is Altibase Regular Expression Library. Therefore, in case the user wishes to use Perl Compatible Regular Expressions Library, regular expression library must be altered with the following statements.
-
Alter the regular expression library for the system when Altibase server is running.
To apply this change, the session has to be reconnected.
ALTER SYSTEM SET REGEXP_MODE=1; -
Alter the regular expression library for the session when Altibase server is running.
ALTER SESSION SET REGEXP_MODE=1; -
Alter the regular expression library permanently
Add
REGEXP_MODE=1in altibase.properties file and restart the Altibase server.