Window(Analytic) Functions
Window (Analytic) Functions#
Commonly known as analytic functions, window functions(ANSI/ISO SQL standard uses the term window functions) calculate aggregate values based on groups of rows. Window functions(or analytic functions) are similar to regular aggregate functions in the sense that they operate on multiple numbers of rows or groups of rows within result sets returned by queries. However, the groups of rows that window functions operate on are defined by the PARTITION BY and the ROWS/RANGE subclause nested in the OVER clause, rather than by the GROUP BY clause. Order within these groups is also decided by the ORDER BY subclause nested in the OVER clause, rather than by the ORDER BY clause of the main query.
In this section, the group divided by the PARTITION BY subclause will be called a "partition", and the group divided by the ROWS/RANGE subclause will be called a "window".
Altibase version 6.3.1 onwards supports the following window functions:
-
Aggregate window functions
AVG, COUNT, MAX, MIN, STDDEV, SUM, VARIANCE, GROUP_CONCAT, RATIO_TO_REPORT
-
Ranking window functions
RANK, DENSE_RANK, ROW_NUMBER, LAG, LEAD, NTILE
-
Row order-related window functions
FIRST_VALUE, LAST_VALUE, NTH_VALUE
Aggregate window functions execute the calculation of the sum or average of the column values within a partition. Altibase supports the use of all aggregate functions listed in the "Aggregate Function" section as window functions (except CUME_DIST, PERCENT_RANK). Aggregate functions generally return one result row per group; when used as window functions, however, values are returned per row.
Ranking window functions return the order value of each row within a partition. Functions of this type require the ORDER BY subclause nested in the OVER clause.
Functions associated with row order search for the first or last value within an ordered partition set or the preceding or following value within a particular set.
Descriptions on other functions other than aggregate functions are provided in the following section.
Syntax#
window_function ::=#
window_specification ::=#
window_partition_clause ::=#
window_order_clause ::=#
window_frame_clause ::=#
Description#
Analytic functions can only appear in a SELECT list or in an ORDER BY clause.
The steps taken when processing queries containing analytic functions are as follows:
- Step 1: The query is processed, with the exception of any window functions and the ORDER BY clause (if present).
- Step 2: If any window functions contain the PARTITION BY subclause, the query results of Step 1 are divided into partitions on which the window functions are to be executed.
- Step 3: If an ORDER BY expression is present, the results are ordered for each of the partitions.
- Step 4: If the ROWS/RANGE clause is present, the window frame is decided.
- Step 5: The window functions are executed.
- Step 6: If an ORDER BY clause is present, it is processed.
window_function#
This specifies the name for the function to be used as a window function.
arg_expr#
This specifies expressions to be used as arguments for the window function.
IGNORE NULLS#
When this function is used, it will return the values excluding NULL values.
OVER window_specificatoin#
OVER is a keyword that instructs the query processor that this function will operate on a queried result set.
Window functions that include this clause can be specified by the SELECT list or the ORDER BY clause.
For all window functions used in the select list of a query, the OVER clause follows. The OVER clause decides how to divide and order a result set for window function execution. After the OVER keyword, the following three subclauses that specify division and sort policies may follow:
window_partition_clause#
This clause specifies one or more expressions or columns as the criteria for merging queried result sets into groups (partitions). All window functions support this clause; its use is optional. On omission, window functions process the overall result set as one partition.
If aggregate-related functions use this clause in the absence of the ORDER BY subclause, the DISTINCT keyword can be used as a function argument.
window_order_clause#
This clause specifies one or more expressions or columns as the criterion for ordering data within a partition. The use of this clause is optional for aggregate functions; for ranking functions, however, its use is mandatory.
For aggregate-related functions, the DISTINCT keyword cannot be used as a function argument with this clause.
With the NULLS FIRST or NULLS LAST keyword, the window_order_clause can be used to position NULL at the beginning or end of the order.
window_frame_clause#
This clause defines a row-based window (a set of physical or logical rows) on which functions are to be operated; functions apply to all rows within the window. A window traverses queried result sets or partitions from top to bottom.
ROWS specifies a window based on the number of rows and RANGE specifies a window based on row values. Refer to the following examples:
-
RANGE BETWEEN 50 PRECEDING AND 150 FOLLOWING
All rows corresponding to (current row value - 50) and (current row value + 150) within the partition are defined as a window. -
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
The row preceding the current row and the row following it within the partition are included in the definition of a window -
RANGE UNBOUNDED PRECEDING
The current row up to the first row within the partition are defined as a window. -
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
The first row down to the last row within the partition are defined as a window.
Ranking-related functions cannot use this clause; its use is optional for aggregate-related functions. For row ranking-related functions, its use is optional - however, it would be pointless to execute the function without using this clause.
The ORDER BY subclause is mandatory in order to specify this clause. If this clause is included, the DISTINCT keyword cannot be used as a window function argument.
If ROWS/RANGE clauses are not specified while using window functions that support ROWS/RANGE clauses, the default value is 'RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW'.
Only positive integers are accepted for value; expressions are not accepted.
If a DATE type expression is used in an ORDER BY subclause of a window function, the following forms can come at value of the RANGE clause:
INTERVAL n [YEAR | MONTH | DAY | HOUR | MINUTE | SECOND]
DENSE_RANK#
Syntax#
DENSE_RANK () OVER {...}
Description#
Like the RANK function, this function assigns rankings based on a certain members of a result set or partition. However, DENSE_RANK does not leave a gap in the sequence after the occurrence of duplicate values. The return value type is BIGINT.
FIRST_VALUE#
Syntax#
FIRST_VALUE (expr) OVER {...}
Description#
This function obtains the value of the first row within a partition or window.
Limitations#
- The DISTINCT keyword cannot be used as the function argument.
- The OVER clause is mandatory.
FIRST_VALUE_IGNORE_NULLS#
Syntax#
FIRST_VALUE_IGNORE_NULLS (expr) OVER {...}
Description#
This function obtains the value of the first row excluding NULL values within a partition or window.
Limitations#
Same as the FIRST_VALUE function.
LAG#
Syntax#
LAG (expr [, offset [, default_value]]) OVER {...}
Description#
This function obtains the value of the row at the given physical offset preceding the current row within each ordered partition. Only positive integers are accepted for offset; on omission, the default value is 1. For rows that exceed the specified offset within the partition, default_value is applied. On default_value omission, the default value is NULL.
Limitation#
- The DISTINCT keyword cannot be used as a function argument.
- The ORDER BY subclause must be present within the OVER clause.
- Only positive integers are accepted for offset
LAG_IGNORE_NULLS#
Syntax#
LAG_IGNORE_NULLS (expr [, offset [, default_value]]) OVER {...}
Description#
This function obtains the first value (excluding NULL) of the row at the given physical offset preceding the current row within each ordered partition. Only positive integers are accepted for offset; on omission, the default value is 1. For rows that exceed the specified offset within the partition, default_value is applied. On default_value omission, the default value is NULL.
Limitation#
Same as the LAG function.
LAST_VALUE#
Syntax#
LAST_VALUE (expr) OVER {...}
Description#
This function obtains the value of the last row within a partition or window.
Limitations#
- The DISTINCT keyword cannot be used as a function argument.
- The OVER clause is mandatory.
LAST_VALUE_IGNORE_NULLS#
Syntax#
LAST_VALUE_IGNORE_NULLS (expr) OVER {...}
Description#
This function obtains the value of the last row excluding NULL values within a partition or window.
Limitation#
Same as the LAST_VALUE function.
LEAD#
Syntax#
LEAD (expr [, offset [, default_value]]) OVER {...}
Description#
This function obtains the value of the row at the given physical offset following the current row within each ordered partition. Only positive integers are accepted for offset; on omission, the default value is 1. For rows that exceed the specified offset, default_value is applied. On default_value omission, the default value is NULL.
Limitations#
- The DISTINCT keyword cannot be used as a function argument.
- The ORDER BY subclause must be present within the OVER clause.
- Only positive integers are accepted for offset.
LEAD_IGNORE_NULLS#
Syntax#
LEAD_IGNORE_NULLS (expr [, offset [, default_value]]) OVER {...}
Description#
This function obtains the first value (excluding NULL) of the row at the given physical offset following the current row within each ordered partition. Only positive integers are accepted for offset; on omission, the default value is 1. For rows that exceed the specified offset, default_value is applied. On default_value omission, the default value is NULL.
Limitation#
Same as the LEAD function
LISTAGG#
Syntax#
Description#
LISTAGG converts column values that correspond to exp as a column, in the order specified for order_by_clause. arg separates the returned string; on omission, the string is not separated.
LISTAGG can be used as aggregate and analytic functions.
Example#
<Query> The following example uses LISTAGG as an aggregate function. Print name from table emp, grouped by empno, in the order of job. Separate each name with a semicolon(;).
SELECT dno
, CAST(LISTAGG(e_lastname,';') WITHIN GROUP( ORDER BY emp_job) AS VARCHAR(100)) "emp_job"
FROM employees
GROUP BY dno;
DNO emp_job
-------------------------------------------------------------------------------------------------
1001 Kobain ;Chen
1002 Jones ;Momoi
1003 Miura ;Bae ;Davenport ;Liu
2001 Fubuki
3001 Foster
3002 Moon ;Ghorbani
4001 Wang ;Huxley ;Diaz
4002 Fleischer ;Blake ;Hammond ;Marquez
Davenport
9 rows selected.
<Query> The following example uses LISTAGG as an analytic function. Print name from the table emp, in the order of job. Separate each name with a semicolon(;).
SELECT dno
, CAST(LISTAGG(e_lastname,';') WITHIN GROUP(ORDER BY emp_job) OVER ( PARTITION BY dno ) AS VARCHAR(100)) "emp_job"
FROM employees;
DNO emp_job
-------------------------------------------------------------------------------------------------
1001 Kobain ;Chen
1001 Kobain ;Chen
1002 Jones ;Momoi
1002 Jones ;Momoi
1003 Miura ;Bae ;Davenport ;Liu
1003 Miura ;Bae ;Davenport ;Liu
1003 Miura ;Bae ;Davenport ;Liu
1003 Miura ;Bae ;Davenport ;Liu
2001 Fubuki
3001 Foster
3002 Moon ;Ghorbani
3002 Moon ;Ghorbani
4001 Wang ;Huxley ;Diaz
4001 Wang ;Huxley ;Diaz
4001 Wang ;Huxley ;Diaz
4002 Fleischer ;Blake ;Hammond ;Marquez
4002 Fleischer ;Blake ;Hammond ;Marquez
4002 Fleischer ;Blake ;Hammond ;Marquez
4002 Fleischer ;Blake ;Hammond ;Marquez
Davenport
20 rows selected.
NTH_VALUE#
Syntax#
NTH_VALUE (expr, offset) OVER {...}
Description#
This function obtains the value of the row at the given physical offset within a partition or window.
Limitations#
- The DISTINCT keyword cannot be used as a function argument.
- The OVER clause is mandatory.
NTH_VALUE_IGNORE_NULLS#
Syntax#
NTH_VALUE_IGNORE_NULLS (expr, offset) OVER {...}
Description#
This function obtains the value of the row at the given physical offset excluding NULL values within a partition or window.
Limitations#
Same as the NTH_VALUE function.
NTILE#
Syntax#
Description#
The NTILE function determines the group order by equally dividing the number of expr inserted based upon particular buckets in an sorted data. The return value is BIGNT.
Example#
<Query> Equally divide the sorted salary into 3 groups and verify the group order.
SELECT e_firstname
, salary
, NTILE(3) OVER(ORDER BY salary)
FROM employees;
E_FIRSTNAME SALARY NTILE(3)OVER(ORDERBYSALARY)
------------------------------------------------------------------
Gottlieb 500 1
Mitch 980 1
Jason 1000 1
Curtis 1200 1
Takahiro 1400 1
Susan 1500 1
Ryu 1700 1
Aaron 1800 2
Alvar 1800 2
Sandra 1890 2
John 1900 2
Ken 2000 2
Yuu 2003 2
Wei-Wei 2300 2
Farhad 2500 3
Zhen 2750 3
Elizabeth 4000 3
Chan-seung 3
Xiong 3
William 3
20 rows selected.
PERCENTILE_CONT#
Syntax#
Description#
PERCENTILE_CONT is an inverse distribution function based on a continuous distribution model, which sorts values and returns a value corresponding to the specified percentile value. This function returns a FLOAT data type and ignores nulls.
expr1 is a constant between 0 and 1; expr1 cannot be a NUMERIC data type.
PERCENTILE_CONT can be used as aggregate and analytic functions.
Examples#
<Query> The following example uses PERCENTILE_CONT as an aggregate function. Print the median salary for empno from the table emp.
SELECT dno
, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary ASC) "median asc cont"
, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) "median desc cont"
FROM employees
GROUP BY dno;
DNO median asc cont median desc cont
-------------------------------------------------
1001 2150 2150
1002 1340 1340
1003 2376.5 2376.5
2001 1400 1400
3001 1800 1800
3002 2500 2500
4001 1550 1550
4002 1800 1800
1500 1500
9 rows selected.
<Query> The following example uses PERCENTILE_CONT as an analytic function.
SELECT dno
, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary ASC) OVER ( PARTITION BY dno ) "median asc cont"
, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) OVER ( PARTITION BY dno ) "median desc cont"
FROM employees;
DNO median asc cont median desc cont
-------------------------------------------------
1001 2150 2150
1001 2150 2150
1002 1340 1340
1002 1340 1340
1003 2376.5 2376.5
1003 2376.5 2376.5
1003 2376.5 2376.5
1003 2376.5 2376.5
2001 1400 1400
3001 1800 1800
3002 2500 2500
3002 2500 2500
4001 1550 1550
4001 1550 1550
4001 1550 1550
4002 1800 1800
4002 1800 1800
4002 1800 1800
4002 1800 1800
1500 1500
20 rows selected.
PERCENTILE_DISC#
Syntax#
Description#
PERCENTILE_DISC is an inverse distribution function based on a discrete distribution model, which sorts values and returns a value corresponding to the specified percentile value. This function returns a FLOAT data type and ignores nulls.
expr1 is a constant between 0 and 1; expr1 cannot be a NUMERIC data type.
PERCENTILE_DISC can be used as aggregate and analytic functions.
Examples#
<Query> The following example uses PERCENTILE_DISC as an aggregate function. Print the median salary for empno from the table emp. Whereas PERCENTILE_CONT returns the average of two middle values for an even number of groups, PERCENTILE_DISC returns the first of the two middle values.
SELECT dno
, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary ASC) "median asc cont"
, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) "median desc cont"
FROM employees
GROUP BY dno;
DNO median asc cont median desc cont
-------------------------------------------------
1001 2000 2300
1002 980 1700
1003 2003 2750
2001 1400 1400
3001 1800 1800
3002 2500 2500
4001 1200 1900
4002 1800 1800
1500 1500
9 rows selected.
<Query> The following example uses PERCENTILE_DISC as an analytic function.
SELECT dno
, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary ASC) OVER ( PARTITION BY DNO ) "median asc cont"
, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) OVER ( PARTITION BY dno ) "median desc cont"
FROM employees;
DNO median asc cont median desc cont
-------------------------------------------------
1001 2000 2300
1001 2000 2300
1002 980 1700
1002 980 1700
1003 2003 2750
1003 2003 2750
1003 2003 2750
1003 2003 2750
2001 1400 1400
3001 1800 1800
3002 2500 2500
3002 2500 2500
4001 1200 1900
4001 1200 1900
4001 1200 1900
4002 1800 1800
4002 1800 1800
4002 1800 1800
4002 1800 1800
1500 1500
20 rows selected.
RANK#
Syntax#
Description#
The RANK function calculates and determines the ranking based upon particular members of a result set or partitions. The identical values are equally ranked. To that extent, raking is skipped, then the following rank is determined. The type of return value is BIGINT. Also, the RANK function can be used as an aggregate and analytical functions.
Note : The number of arguments of RANK and that of the WITHIN GROUP should be identical, and the type of arguments are not compatible each other. Even though there is no restriction on the data type of RANK argument; however, a constant is not suggested for using.
Example#
<Query> Display the rank of the employees with DNO of 1003 and SALARY of 1001.
SELECT dno
, salary
FROM employees
ORDER BY 1, 2;
DNO SALARY
---------------------------
1001 2000
1001 2300
1002 980
1002 1700
1003 1000
1003 2003
1003 2750
1003 4000
2001 1400
3001 1800
3002 2500
3002
4001 1200
4001 1900
4001
4002 500
4002 1800
4002 1890
4002
1500
20 rows selected.
SELECT RANK(1003, 1001) WITHIN GROUP (ORDER BY dno, salary )
FROM employees ;
RNK
-----------------------
6
1 row selected.
RATIO_TO_REPORT#
Syntax#
Description#
The RATIO_TO_REPORT function calculates the ratio of expr inserted based upon particular buckets of partitions. If null is inserted, the null is returned, and if the window_partition_clause statement is omitted, the ratio is calculated based on all the returning rows.
Example#
<Query> Confirm the ratio of employee salary in each department.
SELECT e_firstname
, dno
, salary
, RATIO_TO_REPORT(salary) OVER (PARTITION BY dno) AS result
FROM employees LIMIT 9;
E_FIRSTNAME DNO SALARY RESULT
---------------------------------------------------------------
Ken 1001 2000 0.465116279
Wei-Wei 1001 2300 0.534883721
Ryu 1002 1700 0.634328358
Mitch 1002 980 0.365671642
Elizabeth 1003 4000 0.410130216
Zhen 1003 2750 0.281964524
Yuu 1003 2003 0.205372706
Jason 1003 1000 0.102532554
Takahiro 2001 1400 1
9 rows selected.
ROW_NUMBER#
Syntax#
ROW_NUMBER () OVER {...}
Description#
The ROW_NUMBER function sequentially assigns unique numbers based on the members of a result set or partition. The numbers are assigned starting from 1 in the order determined by the ORDER BY expression.
If duplicate values exist in the result set, the order in which the row numbers are assigned to the duplicate values cannot be guaranteed. If there is a need to guarantee the order in which the row numbers are assigned, use an ORDER BY clause that references some other column. The return value type is BIGINT.
Examples#
<Query> Rank the employees in each department by salary in decreasing order.
SELECT e_lastname
, dno
, salary
, RANK() OVER (PARTITION BY dno ORDER BY salary DESC) rank
FROM employees;
E_LASTNAME DNO SALARY RANK
--------------------------------------------------------------
Chen 1001 2300 1
Kobain 1001 2000 2
Momoi 1002 1700 1
Jones 1002 980 2
Bae 1003 4000 1
Liu 1003 2750 2
Miura 1003 2003 3
Davenport 1003 1000 4
Fubuki 2001 1400 1
Foster 3001 1800 1
Ghorbani 3002 2500 1
Moon 3002 2
Huxley 4001 1900 1
Diaz 4001 1200 2
Wang 4001 3
Hammond 4002 1890 1
Marquez 4002 1800 2
Fleischer 4002 500 3
Blake 4002 4
Davenport 1500 1
20 rows selected.
<Query> This example illustrates the differences in the output of the RANK, DENSE_RANK, and ROW_NUMBER functions.
SELECT salary
, RANK() OVER (ORDER BY salary DESC) rank
, DENSE_RANK() OVER (ORDER BY salary DESC) dense_rank
, ROW_NUMBER() OVER (ORDER BY salary DESC) row_number
FROM employees;
SALARY RANK DENSE_RANK ROW_NUMBER
---------------------------------------------------------------------------
4000 1 1 1
2750 2 2 2
2500 3 3 3
2300 4 4 4
2003 5 5 5
2000 6 6 6
1900 7 7 7
1890 8 8 8
1800 9 9 9
1800 9 9 10
1700 11 10 11
1500 12 11 12
1400 13 12 13
1200 14 13 14
1000 15 14 15
980 16 15 16
500 17 16 17
18 17 18
18 17 19
18 17 20
20 rows selected.
<Query> Output each employee's salary as a percentage of the maximum salary in the employee's department.
SELECT e_lastname
, dno
, salary
, ROUND(salary/MAX(salary) OVER (PARTITION BY dno)*100) rel_sal
FROM
E_LASTNAME DNO SALARY REL_SAL
---------------------------------------------------------------
Kobain 1001 2000 87
Chen 1001 2300 100
.
.
.
Diaz 4001 1200 63
Fleischer 4002 500 26
Marquez 4002 1800 95
Blake 4002
Hammond 4002 1890 100
Davenport 1500 100
20 rows selected.
<Query> The following example obtains the lowest salary, highest salary and 3rd lowest salary by gender using FIRST_VALUE, LAST_VALUE, NTH_VALUE functions.
SELECT sex
, salary
, FIRST_VALUE( salary ) OVER ( PARTITION BY sex ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) f_value
, LAST_VALUE( salary ) OVER ( PARTITION BY sex ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) l_value
, NTH_VALUE( salary, 3 ) OVER ( PARTITION BY sex ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) n_value
FROM employees;
SEX SALARY F_VALUE L_VALUE N_VALUE
------------------------------------------------------------------
F 1500 1500 4000 2300
F 1890 1500 4000 2300
F 2300 1500 4000 2300
F 4000 1500 4000 2300
M 500 500 1000
M 980 500 1000
M 1000 500 1000
M 1200 500 1000
M 1400 500 1000
M 1700 500 1000
M 1800 500 1000
M 1800 500 1000
M 1900 500 1000
M 2000 500 1000
M 2003 500 1000
M 2500 500 1000
M 2750 500 1000
M 500 1000
M 500 1000
M 500 1000
20 rows selected.
<Query> The following example uses LAG, LEAD functions.
SELECT salary
, RANK() OVER (ORDER BY salary DESC) rank
, DENSE_RANK() OVER (ORDER BY salary DESC) dense_rank
, ROW_NUMBER() OVER (ORDER BY salary DESC) row_number
, LAG( salary ) OVER (ORDER BY salary DESC) lag
, LEAD( salary ) OVER (ORDER BY salary DESC) lead
FROM employees;
SALARY RANK DENSE_RANK ROW_NUMBER LAG LEAD
----------------------------------------------------------------------------------------------------------
4000 1 1 1 2750
2750 2 2 2 4000 2500
2500 3 3 3 2750 2300
2300 4 4 4 2500 2003
2003 5 5 5 2300 2000
2000 6 6 6 2003 1900
1900 7 7 7 2000 1890
1890 8 8 8 1900 1800
1800 9 9 9 1890 1800
1800 9 9 10 1800 1700
1700 11 10 11 1800 1500
1500 12 11 12 1700 1400
1400 13 12 13 1500 1200
1200 14 13 14 1400 1000
1000 15 14 15 1200 980
980 16 15 16 1000 500
500 17 16 17 980
18 17 18 500
18 17 19
18 17 20
20 rows selected.