Aggregate Functions

Aggregate Functions#

Aggregate functions process multiple rows and return a single resultant value. Aggregate functions can be used in the SELECT list and in the ORDER BY and HAVING clauses.

If a SELECT statement contains a GROUP BY clause, then only constants, aggregate functions, the expressions in the GROUP BY clause, and expressions that are based on the foregoing can be specified in the SELECT list.

AVG#

Syntax#

AVG ( [ALL | DISTINCT] expression)

Description#

This function calculates the average of the input expressions. NULL values are excluded from the calculation. The function returns a FLOAT type value.

Example#

<Query> Calculate and display the average price in the goods table.

SELECT AVG(price) FROM goods;

AVG(PRICE)  
--------------
30406.173   
1 row selected.

CORR#

Syntax#

CORR (expr1, expr2) OVER {...}

Description#

This function calculates the coefficient of correlation of inserted expr1 and expr2. The result would be greater than or equal to -1 , and smaller than or equal to 1. As the result goes nearer to 0, the coefficient of correlation would be smaller, however, if it gets closer to +1, the coefficient of correlation would be much deeper. The return type is DOUBLE, and variables should be calculated by converting with DOUBLE.

The CORR function can be used as an aggregate functions and an analytic function.

Example#

<Query> Calculate the coefficient of correlation between the employee number and salary.

SELECT CORR(ENO,SALARY) FROM employees;

CORR(ENO,SALARY)
-------------------------
-0.02180715597157
1 row selected.

COUNT#

Syntax#

COUNT ( [ * | [ALL | DISTINCT] expression ] )

Description#

COUNT returns the number of rows returned by the query. NULL values are not counted.

Examples#

<Query> Display the number of records in the employees table.

SELECT COUNT(*) rec_count 
  FROM employees;

REC_COUNT            
-----------------------
20                   
1 row selected.

<Query> Display the number of birthday tables in the table employees.

SELECT COUNT(birth) rec_count
  FROM employees;

REC_COUNT            
-----------------------
13                   
1 row selected.

COVAR_SAMP#

Syntax#

COVAR_SAMP (expr1, expr2) OVER {...}

Description#

Calculate the sample covariance of inserted expr1 and expr2. If an input value rises along with another value, the result would be positive numbers, but if another value falls while the input value rises, the result would be negative numbers. The return type is DOUBLE, and variables should be calculated by converting the variables with DOUBLE.

The COVAR_SAMP function can be used as an aggregate a function and analytic function.

Example#

<Query> Calculate the sample voariance of employees number and salary.

SELECT COVAR_SAMP(ENO,SALARY) FROM employees;

COVAR_SAMP(ENO,SALARY)
-------------------------
-95.0698529411784
1 row selected.

COVAR_POP#

Syntax#

COVAR_POP (expr1, expr2) OVER {...}

Description#

This function is used to calculate the population covariance of inserted expr1 and expr2. If an input value rises along with another value, the result would be positive numbers, but if another value falls while the input value rises, the result would be negative numbers. The return type is DOUBLE, and variables should be calculated by converting the variables with DOUBLE.

The COVAR_POP function can be used as an aggregate function and an analytic function.

Example#

<Query> Calculate the population covariance of the employee number and salary

SELECT COVAR_POP(ENO,SALARY) FROM employees;

COVAR_POP(ENO,SALARY)
-------------------------
-89.4775086505208
1 row selected.

CUME_DIST#

Syntax#

cume_dist_with_group

window_order_clause::=

Description#

The CUME_DIST function calculates the accumulative distribution of a group of values that are lined up based on a result set (or partitions) in particular. The return value is greater than 0 or less than 1.

Note: The number of arguments in the CUME_DIST and that of the WITHIN GROUP clause should be corresponding one another, but the argument type is not automatically compatible. Also, the data type of CUME_DIST argument is not restricted; however, using a constant is recommended.

Example#

<Query> Confirm the accumulative distribution of SALARY by using the CUME_DIST function.

SELECT SALARY
  FROM EMPLOYEES
 ORDER BY 1;

SALARY
--------------
500
980
1000
1200
1400
1500
1700
1800
1800
1890
1900
2000
2003
2300
2500
2750
4000
...
20 rows selected.

SELECT CUME_DIST(1500) WITHIN GROUP (ORDER BY salary )
 FROM employees;

CUME_DIST(1500) within group (order by SAL
---------------------------------------------
0.333333333333333
1 row selected.

FIRST#

Syntax#

first_clause

Description#

This function aggregates only the first part of the data sorted by ORDER BY clause. The functions corresponding to aggregation_function are MIN, MAX, SUM, AVG, COUNT, VARIANCE, and STDDEV.

Example#

<Query> Find the employee number and salary of the employee who receives the most salary by department from the employee table.

SELECT dno
     , MAX(eno) KEEP(DENSE_RANK FIRST ORDER BY salary DESC) AS empno
     , MAX(salary) AS max_sal
  FROM employees
 GROUP BY dno;

DNO         EMPNO       MAX_SAL     
----------------------------------------
1001        16          2300        
1002        6           1700        
1003        10          4000        
2001        17          1400        
3001        4           1800        
3002        1           2500        
4001        8           1900        
4002        20          1890        
            2           1500
9 row selected.

GROUP_CONCAT#

Syntax#

GROUP_CONCAT (expr1 [, arg1])

Description#

GROUP_CONCAT returns a character string of concatenated, non-NULL, expr1 values for each group.

arg1: The separator. On omission, the separator is not inserted in the character string.

Example#

<Query> Return a character string that concatenates the last names of the employees for each department.

SELECT dno
     , CAST(GROUP_CONCAT(e_lastname, '|') AS VARCHAR (100)) AS names
  FROM employees
 GROUP BY dno;

DNO         NAMES                                                                             
-------------------------------------------------------------------------------------------------
1001        Kobain              |Chen                                                         
1002        Momoi               |Jones                                                        
1003        Bae                 |Liu                 |Miura               |Davenport          

2001        Fubuki                                                                            
3001        Foster                                                                            
3002        Moon                |Ghorbani                                                     
4001        Wang                |Diaz                |Huxley                                  
4002        Fleischer           |Hammond             |Marquez             |Blake              

            Davenport           
9 rows selected.

LAST#

Syntax#

last_clause

Description#

This function aggregates only the last part of the data sorted by the ORDER BY clause. The functions corresponding to aggregation_function are MIN, MAX, SUM, AVG, COUNT, VARIANCE, and STDDEV.

Example#

<Query> Find the employee number and salary of the employee who receives the least amount of salary by department from the employee table.

SELECT dno
     , MIN(eno) KEEP(DENSE_RANK LAST ORDER BY salary DESC) AS empno
     , MIN(salary) AS min_sal
  FROM employees
 GROUP BY dno;

DNO          EMPNO          MIN_SAL
-----------------------------------------
1001         3               2300
1002         13              1700
1003         15              4000
2001         17              1400
3001         4               1800
3002         5               2500
4001         9               1900
4002         7               1090
             2               1500
9 row selected.

MAX#

Syntax#

MAX ([ALL | DISTINCT] expression)

Description#

MAX returns the greatest value of expr from among the input values.

Example#

<Query> Display the highest price in the table goods.

SELECT MAX(price) FROM goods;

MAX(PRICE)  
--------------
100000      
1 row selected.

MIN#

Syntax#

MIN ([ALL | DISTINCT] expression)

Description#

MIN returns the lowest value of expr from among the input values.

Example#

<Query> Display the lowest price in the table goods.

SELECT MIN(price) FROM goods;

MIN(PRICE)  
--------------
966.99      
1 row selected.

PERCENT_RANK#

Syntax#

percent_rank_with_group

window_order_clause::=

Description#

The PERCENT_RANK function determines the percentage rank based upon results sets(or partitions). The range of values returned by this function is 0 to 1.

Note: The number of arguments in the PERCENT_RANK and WITHIN GROUP should be corresponding, but the argument type is not automatically compatible.

Example#

<Query> Display the percentage rank of employees with department number (DNO) 1003 and salary (SALARY) 1000.

SELECT dno
     , salary
  FROM employees
 ORDER BY 1, 2;

DNO         SALARY
---------------------------
1001        2000
1001        2300
1002        980
1002        1700
1003        1000
1003        2003
1003        2750
1003        4000
2001        1400
3001        1800
3002        2500
3002
4001        1200
4001        1900
4001
4002        500
4002        1800
4002        1890
4002
            1500
20 rows selected.

SELECT PERCENT_RANK(1003, 1000) WITHIN GROUP (ORDER BY dno, salary ) 
  FROM EMPLOYEES;

RNK
-------------------------
0.2
1 row selected.

STATS_ONE_WAY_ANOVA#

Syntax#

stats_one_way_anova

Description#

STATS_ONE_WAY_ANOVA is a one-way ANOVA function that returns the value for the third argument in the FLOAT data type.

This function takes three arguments and returns a value in SIG if the third argument is omitted. expr1 groups the data and expr2 contains the values of expr1.

The following table describes the third argument.

Return Value	Description
SIG	Significance
F_RATIO	(Mean squares within groups / mean squared between groups)
MEAN_SQUARES_WITHIN (MSW)	Mean squares within groups
MEAN_SQUARES_BETWEEN (MSB)	Mean squares between groups
DF_WITHIN(DFW)	Degree of freedom within groups/p>
DF_BETWEEN(DFB)	Degree of freedom between groups
SUM_SQUARES_WITHIN (SSW)	Sum of squares within groups
SUM_SQUARES_BETWEEN (SSB)	Sum of squares between groups

Example#

SELECT * FROM t3;

        ID      VALUE
---------- ----------
         1          1
         1          2
         1          3
         2          3
         2          4

SELECT STATS_ONE_WAY_ANOVA(id, value, 'SUM_SQUARES_BETWEEN') SSB
     , STATS_ONE_WAY_ANOVA(id, value, 'SUM_SQUARES_WITHIN') SSW
     , STATS_ONE_WAY_ANOVA(id, value, 'DF_BETWEEN') DFB
     , STATS_ONE_WAY_ANOVA(id, value, 'DF_WITHIN') DFW
     , STATS_ONE_WAY_ANOVA(id, value, 'MEAN_SQUARES_BETWEEN') MSB
     , stats_one_way_anova(id, value, 'MEAN_SQUARES_WITHIN') MSW
     , STATS_ONE_WAY_ANOVA(id, value, 'F_RATIO') F
     , STATS_ONE_WAY_ANOVA(id, value, 'SIG') P_VALUE
  FROM t3;

       SSB        SSW        DFB        DFW        MSB        MSW          F    P_VALUE
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       2.7        2.5          1          3        2.7 .833333333       3.24 .169679927

STDDEV#

Syntax#

STDDEV ([ALL | DISTINCT] expression)

Description#

STDDEV returns the standard deviation of the input expressions.

Because error can accumulate when using system calls to perform repeated operations on real numbers, the return value may have some amount of inherent tolerance (error).

Example#

<Query> Calculate the standard deviation of the wages in the employees table.

SELECT STDDEV(salary) standard_deviation 
  FROM employees;

STANDARD_DEVIATION
-------------------------
797.706786762566
1 row selected.

STDDEV_POP#

Syntax#

STDDEV_POP (expression)

Description#

The STDDEV_POP returns the standard deviation of all rows which are not NULL of input expression.

NULL is returned when the number of row is less than 1 which is not NULL. There might be an error in the return value due to accumulation of repetitive computing error by a system call.

Example#

<Query> Calculate the standard deviation of wages in the employees table.

SELECT STDDEV_POP(salary) FROM employees;

STDDEV_POP(SALARY)
-------------------------
773.889256492736
1 row selected.

STDDEV_SAMP#

Syntax#

STDDEV_SAMP ([ALL | DISTINCT] expression)

Description#

The STDDEV_SAMP returns the sample standard deviation of all rows which are not NULL of input expression.

NULL is returned when the number of row, which is not NULL, is less than 2.

There might be an error in the return value due to the accumulaion of repetitive computing error by a system call.

Example#

<Query> Calculate the sample standard deviation of wages in the table employees.

SELECT STDDEV_SAMP(salary) FROM employees;

STDDEV_SAMP(SALARY)
-------------------------
797.706786762566
1 row selected.

SUM#

Syntax#

SUM ([ALL | DISTINCT] expression)

Description#

SUM returns the result of addition of the input expressions.

Example#

<Query> Calculate the total number of items stored using the table goods.

SELECT SUM(stock) FROM goods;

SUM(STOCK)           
-----------------------
379420               
1 row selected.

VARIANCE#

Syntax#

VARIANCE ([ALL | DISTINCT] expression)

Description#

VARIANCE returns the variance of the input expressions.

Because error can accumulate when using system calls to perform repeated operations on real numbers, the return value may have some amount of inherent tolerance (error).

Example#

<Query> Calculate the variance of the wages in the table employees.

SELECT VARIANCE(salary) variance 
  FROM employees;

VARIANCE
-------------------------
636336.117647059

VAR_POP#

Syntax#

VAR_POP (expression)

Description#

The VAR_POP returns population variance of all rows which are not NULL of input expression.

NULL is returned when the number of row is less than 1.

There might be an error in the return value due to accumulation of repetitive computing error by a system call.

Example#

<Query> Calculate the population standard deviation of wages in the table employees

SELECT VAR_POP(salary) FROM employees;

VAR_POP(SALARY)
-------------------------
598904.581314879
1 row selected.

VAR_SAMP#

Syntax#

VAR_SAMP (expression)

Description#

The VAR_SAMP returns the sample variance of all rows which are not NULL of input expression.

NULL is returned when the number of row is less than 2.

There might be an error in the return value due to accumulation of repetitive computing error by a system call

Example#

<Query> Calculate the sample standard deviation of wages in the table employees.

SELECT VAR_SAMP(salary) FROM employees;

VAR_SAMP(SALARY)
-------------------------
636336.117647059
row selected.