Aggregate Functions
Aggregate Functions#
Aggregate functions process multiple rows and return a single resultant value. Aggregate functions can be used in the SELECT list and in the ORDER BY and HAVING clauses.
If a SELECT statement contains a GROUP BY clause, then only constants, aggregate functions, the expressions in the GROUP BY clause, and expressions that are based on the foregoing can be specified in the SELECT list.
AVG#
Syntax#
AVG ( [ALL | DISTINCT] expression)
Description#
This function calculates the average of the input expressions. NULL values are excluded from the calculation. The function returns a FLOAT type value.
Example#
<Query> Calculate and display the average price in the goods table.
SELECT AVG(price) FROM goods;
AVG(PRICE)
--------------
30406.173
1 row selected.
CORR#
Syntax#
CORR (expr1, expr2) OVER {...}
Description#
This function calculates the coefficient of correlation of inserted expr1 and expr2. The result would be greater than or equal to -1 , and smaller than or equal to 1. As the result goes nearer to 0, the coefficient of correlation would be smaller, however, if it gets closer to +1, the coefficient of correlation would be much deeper. The return type is DOUBLE, and variables should be calculated by converting with DOUBLE.
The CORR function can be used as an aggregate functions and an analytic function.
Example#
<Query> Calculate the coefficient of correlation between the employee number and salary.
SELECT CORR(ENO,SALARY) FROM employees;
CORR(ENO,SALARY)
-------------------------
-0.02180715597157
1 row selected.
COUNT#
Syntax#
COUNT ( [ * | [ALL | DISTINCT] expression ] )
Description#
COUNT returns the number of rows returned by the query. NULL values are not counted.
Examples#
<Query> Display the number of records in the employees table.
SELECT COUNT(*) rec_count
FROM employees;
REC_COUNT
-----------------------
20
1 row selected.
<Query> Display the number of birthday tables in the table employees.
SELECT COUNT(birth) rec_count
FROM employees;
REC_COUNT
-----------------------
13
1 row selected.
COVAR_SAMP#
Syntax#
COVAR_SAMP (expr1, expr2) OVER {...}
Description#
Calculate the sample covariance of inserted expr1 and expr2. If an input value rises along with another value, the result would be positive numbers, but if another value falls while the input value rises, the result would be negative numbers. The return type is DOUBLE, and variables should be calculated by converting the variables with DOUBLE.
The COVAR_SAMP function can be used as an aggregate a function and analytic function.
Example#
<Query> Calculate the sample voariance of employees number and salary.
SELECT COVAR_SAMP(ENO,SALARY) FROM employees;
COVAR_SAMP(ENO,SALARY)
-------------------------
-95.0698529411784
1 row selected.
COVAR_POP#
Syntax#
COVAR_POP (expr1, expr2) OVER {...}
Description#
This function is used to calculate the population covariance of inserted expr1 and expr2. If an input value rises along with another value, the result would be positive numbers, but if another value falls while the input value rises, the result would be negative numbers. The return type is DOUBLE, and variables should be calculated by converting the variables with DOUBLE.
The COVAR_POP function can be used as an aggregate function and an analytic function.
Example#
<Query> Calculate the population covariance of the employee number and salary
SELECT COVAR_POP(ENO,SALARY) FROM employees;
COVAR_POP(ENO,SALARY)
-------------------------
-89.4775086505208
1 row selected.
CUME_DIST#
Syntax#
Description#
The CUME_DIST function calculates the accumulative distribution of a group of values that are lined up based on a result set (or partitions) in particular. The return value is greater than 0 or less than 1.
Note: The number of arguments in the CUME_DIST and that of the WITHIN GROUP clause should be corresponding one another, but the argument type is not automatically compatible. Also, the data type of CUME_DIST argument is not restricted; however, using a constant is recommended.
Example#
<Query> Confirm the accumulative distribution of SALARY by using the CUME_DIST function.
SELECT SALARY
FROM EMPLOYEES
ORDER BY 1;
SALARY
--------------
500
980
1000
1200
1400
1500
1700
1800
1800
1890
1900
2000
2003
2300
2500
2750
4000
...
20 rows selected.
SELECT CUME_DIST(1500) WITHIN GROUP (ORDER BY salary )
FROM employees;
CUME_DIST(1500) within group (order by SAL
---------------------------------------------
0.333333333333333
1 row selected.
FIRST#
Syntax#
Description#
This function aggregates only the first part of the data sorted by ORDER BY clause. The functions corresponding to aggregation_function are MIN, MAX, SUM, AVG, COUNT, VARIANCE, and STDDEV.
Example#
<Query> Find the employee number and salary of the employee who receives the most salary by department from the employee table.
SELECT dno
, MAX(eno) KEEP(DENSE_RANK FIRST ORDER BY salary DESC) AS empno
, MAX(salary) AS max_sal
FROM employees
GROUP BY dno;
DNO EMPNO MAX_SAL
----------------------------------------
1001 16 2300
1002 6 1700
1003 10 4000
2001 17 1400
3001 4 1800
3002 1 2500
4001 8 1900
4002 20 1890
2 1500
9 row selected.
GROUP_CONCAT#
Syntax#
GROUP_CONCAT (expr1 [, arg1])
Description#
GROUP_CONCAT returns a character string of concatenated, non-NULL, expr1 values for each group.
arg1: The separator. On omission, the separator is not inserted in the character string.
Example#
<Query> Return a character string that concatenates the last names of the employees for each department.
SELECT dno
, CAST(GROUP_CONCAT(e_lastname, '|') AS VARCHAR (100)) AS names
FROM employees
GROUP BY dno;
DNO NAMES
-------------------------------------------------------------------------------------------------
1001 Kobain |Chen
1002 Momoi |Jones
1003 Bae |Liu |Miura |Davenport
2001 Fubuki
3001 Foster
3002 Moon |Ghorbani
4001 Wang |Diaz |Huxley
4002 Fleischer |Hammond |Marquez |Blake
Davenport
9 rows selected.
LAST#
Syntax#
Description#
This function aggregates only the last part of the data sorted by the ORDER BY clause. The functions corresponding to aggregation_function are MIN, MAX, SUM, AVG, COUNT, VARIANCE, and STDDEV.
Example#
<Query> Find the employee number and salary of the employee who receives the least amount of salary by department from the employee table.
SELECT dno
, MIN(eno) KEEP(DENSE_RANK LAST ORDER BY salary DESC) AS empno
, MIN(salary) AS min_sal
FROM employees
GROUP BY dno;
DNO EMPNO MIN_SAL
-----------------------------------------
1001 3 2300
1002 13 1700
1003 15 4000
2001 17 1400
3001 4 1800
3002 5 2500
4001 9 1900
4002 7 1090
2 1500
9 row selected.
MAX#
Syntax#
MAX ([ALL | DISTINCT] expression)
Description#
MAX returns the greatest value of expr from among the input values.
Example#
<Query> Display the highest price in the table goods.
SELECT MAX(price) FROM goods;
MAX(PRICE)
--------------
100000
1 row selected.
MIN#
Syntax#
MIN ([ALL | DISTINCT] expression)
Description#
MIN returns the lowest value of expr from among the input values.
Example#
<Query> Display the lowest price in the table goods.
SELECT MIN(price) FROM goods;
MIN(PRICE)
--------------
966.99
1 row selected.
PERCENT_RANK#
Syntax#
Description#
The PERCENT_RANK function determines the percentage rank based upon results sets(or partitions). The range of values returned by this function is 0 to 1.
Note: The number of arguments in the PERCENT_RANK and WITHIN GROUP should be corresponding, but the argument type is not automatically compatible.
Example#
<Query> Display the percentage rank of employees with department number (DNO) 1003 and salary (SALARY) 1000.
SELECT dno
, salary
FROM employees
ORDER BY 1, 2;
DNO SALARY
---------------------------
1001 2000
1001 2300
1002 980
1002 1700
1003 1000
1003 2003
1003 2750
1003 4000
2001 1400
3001 1800
3002 2500
3002
4001 1200
4001 1900
4001
4002 500
4002 1800
4002 1890
4002
1500
20 rows selected.
SELECT PERCENT_RANK(1003, 1000) WITHIN GROUP (ORDER BY dno, salary )
FROM EMPLOYEES;
RNK
-------------------------
0.2
1 row selected.
STATS_ONE_WAY_ANOVA#
Syntax#
Description#
STATS_ONE_WAY_ANOVA is a one-way ANOVA function that returns the value for the third argument in the FLOAT data type.
This function takes three arguments and returns a value in SIG if the third argument is omitted. expr1 groups the data and expr2 contains the values of expr1.
The following table describes the third argument.
Return Value |
Description |
---|---|
SIG |
Significance |
F_RATIO |
(Mean squares within groups / mean squared between groups) |
MEAN_SQUARES_WITHIN |
Mean squares within groups |
MEAN_SQUARES_BETWEEN |
Mean squares between groups |
DF_WITHIN(DFW) |
Degree of freedom within groups/p> |
DF_BETWEEN(DFB) |
Degree of freedom between groups |
SUM_SQUARES_WITHIN |
Sum of squares within groups |
SUM_SQUARES_BETWEEN |
Sum of squares between groups |
Example#
SELECT * FROM t3;
ID VALUE
---------- ----------
1 1
1 2
1 3
2 3
2 4
SELECT STATS_ONE_WAY_ANOVA(id, value, 'SUM_SQUARES_BETWEEN') SSB
, STATS_ONE_WAY_ANOVA(id, value, 'SUM_SQUARES_WITHIN') SSW
, STATS_ONE_WAY_ANOVA(id, value, 'DF_BETWEEN') DFB
, STATS_ONE_WAY_ANOVA(id, value, 'DF_WITHIN') DFW
, STATS_ONE_WAY_ANOVA(id, value, 'MEAN_SQUARES_BETWEEN') MSB
, stats_one_way_anova(id, value, 'MEAN_SQUARES_WITHIN') MSW
, STATS_ONE_WAY_ANOVA(id, value, 'F_RATIO') F
, STATS_ONE_WAY_ANOVA(id, value, 'SIG') P_VALUE
FROM t3;
SSB SSW DFB DFW MSB MSW F P_VALUE
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
2.7 2.5 1 3 2.7 .833333333 3.24 .169679927
STDDEV#
Syntax#
STDDEV ([ALL | DISTINCT] expression)
Description#
STDDEV returns the standard deviation of the input expressions.
Because error can accumulate when using system calls to perform repeated operations on real numbers, the return value may have some amount of inherent tolerance (error).
Example#
<Query> Calculate the standard deviation of the wages in the employees table.
SELECT STDDEV(salary) standard_deviation
FROM employees;
STANDARD_DEVIATION
-------------------------
797.706786762566
1 row selected.
STDDEV_POP#
Syntax#
STDDEV_POP (expression)
Description#
The STDDEV_POP returns the standard deviation of all rows which are not NULL of input expression.
NULL is returned when the number of row is less than 1 which is not NULL. There might be an error in the return value due to accumulation of repetitive computing error by a system call.
Example#
<Query> Calculate the standard deviation of wages in the employees table.
SELECT STDDEV_POP(salary) FROM employees;
STDDEV_POP(SALARY)
-------------------------
773.889256492736
1 row selected.
STDDEV_SAMP#
Syntax#
STDDEV_SAMP ([ALL | DISTINCT] expression)
Description#
The STDDEV_SAMP returns the sample standard deviation of all rows which are not NULL of input expression.
NULL is returned when the number of row, which is not NULL, is less than 2.
There might be an error in the return value due to the accumulaion of repetitive computing error by a system call.
Example#
<Query> Calculate the sample standard deviation of wages in the table employees.
SELECT STDDEV_SAMP(salary) FROM employees;
STDDEV_SAMP(SALARY)
-------------------------
797.706786762566
1 row selected.
SUM#
Syntax#
SUM ([ALL | DISTINCT] expression)
Description#
SUM returns the result of addition of the input expressions.
Example#
<Query> Calculate the total number of items stored using the table goods.
SELECT SUM(stock) FROM goods;
SUM(STOCK)
-----------------------
379420
1 row selected.
VARIANCE#
Syntax#
VARIANCE ([ALL | DISTINCT] expression)
Description#
VARIANCE returns the variance of the input expressions.
Because error can accumulate when using system calls to perform repeated operations on real numbers, the return value may have some amount of inherent tolerance (error).
Example#
<Query> Calculate the variance of the wages in the table employees.
SELECT VARIANCE(salary) variance
FROM employees;
VARIANCE
-------------------------
636336.117647059
VAR_POP#
Syntax#
VAR_POP (expression)
Description#
The VAR_POP returns population variance of all rows which are not NULL of input expression.
NULL is returned when the number of row is less than 1.
There might be an error in the return value due to accumulation of repetitive computing error by a system call.
Example#
<Query> Calculate the population standard deviation of wages in the table employees
SELECT VAR_POP(salary) FROM employees;
VAR_POP(SALARY)
-------------------------
598904.581314879
1 row selected.
VAR_SAMP#
Syntax#
VAR_SAMP (expression)
Description#
The VAR_SAMP returns the sample variance of all rows which are not NULL of input expression.
NULL is returned when the number of row is less than 2.
There might be an error in the return value due to accumulation of repetitive computing error by a system call
Example#
<Query> Calculate the sample standard deviation of wages in the table employees.
SELECT VAR_SAMP(salary) FROM employees;
VAR_SAMP(SALARY)
-------------------------
636336.117647059
row selected.