SQL Aggregate Functions

Track the website visitors

Dear readers of our blog, we'd like to recommend you to visit the main page of our website, where you can learn about our product SQLS*Plus and its advantages.

SQLS*Plus - best SQL Server command line reporting and automation tool! SQLS*Plus is several orders of magnitude better than SQL Server sqlcmd and osql command line tools.

REQUEST COMPLIMENTARY SQLS*PLUS LICENCE

Enteros UpBeat offers a patented database performance management SaaS platform. It proactively identifies root causes of complex revenue-impacting database performance issues across a growing number of RDBMS, NoSQL, and deep/machine learning database platforms. We support Oracle, SQL Server, IBM DB2, MongoDB, Casandra, MySQL, Amazon Aurora, and other database systems.

19 June 2020

SQL Aggregate functions exist in order to be able to generalize the obtained data in any way, i.e. to manipulate them as we want.

These functions are performed using keywords that are included in the SELECT query, and how they are spelled out will be explained later. To make it clear, here are some of the features of aggregate functions in SQL:

Summarize selected values
Find the arithmetic mean
Find the minimum and maximum of the values

Examples of SQL aggregate functions

We will analyze the most frequently used functions and give some examples.

SUM function

This function allows you to sum up the values of a field when requesting SELECT. It is a fairly useful function whose syntax is quite simple, like all other aggregate functions in SQL. For understanding we will start with an example at once:

Get the sum of all orders from the table of Orders, which were made in 2016.

You could just output the sum of the orders, but I think it is quite simple. Let us remind you of the structure of our table:

onum	amt	odate	cnum	snum
1001	128	2016-01-01	9	4
1002	1800	2016-04-10	10	7
1003	348	2017-04-08	2	1
1004	500	2016-06-07	3	3
1005	499	2017-12-04	5	4
1006	320	2016-03-03	5	4
1007	80	2017-09-02	7	1
1008	780	2016-03-07	1	3
1009	560	2017-10-07	3	7
1010	900	2016-01-08	6	8

The next code will make the necessary sampling:

SELECT SUM(amt)
FROM Orders
WHERE odate BETWEEN '2016-01-01' and '2016-12-31';

We’ll get the result:

SUM(amt)
4428

In this query we used the SUM function, after which we need to specify the summation field in brackets. Then we specified a condition in WHERE, which selected only rows from 2016. Actually this condition can be written in a different way, but now more important is the aggregate sum function in SQL.

AVG function

The following function calculates the average arithmetic data field, which we will specify as a parameter. The syntax of this function is identical to that of the summation function. Therefore, we will immediately move on to the simplest task:

Print the average cost of the order from the Orders table.

And let’s go straight to the query:

SELECT AVG(amt)
FROM Orders;

We’ll get the result:

AVG(amt)
591.5

In general, everything is similar to the previous function. And the syntax is simple enough. This is the peculiarity of SQL language – to be understandable to humans.

Functions MIN and MAX

Two more functions that are close in action. They find the minimum or maximum value respectively of the parameter to be passed in parentheses. The syntax is repeated and therefore the following example:

Print maximum and minimum order price values for those orders where the price is less than 1000.

We get such a request,

SELECT MAX(amt), MIN(amt)
FROM Orders
WHERE amt < 1000;

which will lead out:

MAX(amt)	MIN(amt)
900	80

It should also be noted that unlike the previous functions, these 2 can work with symbolic parameters, that is, you can write a query of the MIN(odate) type (in this case, the date is symbolic), and then we will return 2016-01-01.

The point is that these functions have a mechanism for converting characters into ASCII code, which they then compare.

Another important point is that we can perform some simple mathematical operations in a SELECT query, for example, such a query:

SELECT (MAX(amt) - MIN(amt)) AS 'Difference'.
FROM Orders;

Will return that answer:

Difference
1720

COUNT function

This function is necessary to calculate the number of selected values or rows. There are two main ways to use it:

With the DISTINCT keyword, to calculate the number of non-repeating values
Using “*” to calculate the number of all selected values

Now let’s look at an example of using COUNT in SQL:

Calculate the number of orders made and the number of sellers in the Orders table.

SELECT COUNT(*), COUNT(DISTINCT snum).
FROM Orders;

We got it:

COUNT(*)	COUNT(snum)
10	5

Obviously, the number of orders is 10, but if you have a large table, this function will be very convenient. As for unique sellers, it is necessary to use DISTINCT, because one seller can serve several orders.

GROUP BY operator

Now let’s look at 2 important operators that help extend the functionality of our queries in SQL. The first of them is GROUP BY operator, which groups by any field, which is sometimes necessary. And already for this group performs the specified action. For example:

Display the sum of all orders for each seller individually.

That is, now we need for each seller in the table Orders to highlight the fields with the price of the order and sum it up. All this will make the GROUP BY operator in SQL quite easy:

SELECT snum, SUM(amt) AS 'Sum of all orders'.
FROM Orders
GROUP BY snum;

And in the end we’ll get:

snum	Amount of all orders
1	428
3	1280
4	947
7	2360
8	900

As you can see, SQL has allocated a group for each seller and calculated the sum of all their orders.

Operator HAVING

This operator is used as an addition to the previous one. It is necessary to set conditions for data sampling during grouping. If the condition is met, the group is selected, if not, nothing will happen. Let’s consider the following code:

SELECT snum, SUM(amt) AS 'Sum of all the orders'.
FROM Orders
GROUP BY snum
HAVING MAX(amt) > 1000;

Who will create a group for the seller and calculate the amount of orders of this group, only if the maximum order amount exceeds 1000. Obviously, there is only one such seller, a group will be created for him and the sum of all orders will be counted:

snum	Amount of all orders
7	2360

It would seem why not use the WHERE condition here, but SQL is so constructed that in this case it will generate an error, and that is why SQL has the HAVING operator.

Examples of aggregate functions in SQL

1. Write a request that will calculate all amounts of orders completed on January 1, 2016.

SELECT SUM(amt)
FROM Orders
WHERE odate = '2016-01-01';

2. Write a query that counts the number of different, different from NULL, values of the city field in the customer table.

SELECT COUNT(DISTINCT city)
FROM customers;

3. Write a request that would select the lowest amount for each customer.

SELECT cnum, MIN(amt)
FROM orders
GROUP BY cnum;

4. Write a request that selects customers whose names begin with the letter H.

SELECT cname
FROM customers
WHERE cname LIKE 'H%' ;

5. Write a request that would select the highest rating in each city.

SELECT city, MAX(rating)
FROM customers
GROUP BY city;

Let’s consider such a task:

Find the maximum value among average PC prices calculated for each manufacturer separately.
It is not possible to calculate the average cost values for labor producers:

SELECT AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker;

However, the standard prohibits the use of a subquery as an argument for an aggregate function, i.e. the task cannot be solved in the following way:

SELECT MAX(
SELECT AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
);

In such cases a subquery in the FROM offer is used:

SELECT MAX(avg_price)
FROM (SELECT AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
) X;

With the new features of the language – window functions – this task can be solved without any subquery:

SELECT DISTINCT MAX(AVG(price)) OVER () max_avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker;

Note that window functions allow you to use an aggregate function as an argument. The keyword DISTINCT is required here as the maximum value calculated across the entire set of mean values will be “attributed” to each manufacturer.

The standard also prohibits the use of an aggregate function as an argument for another aggregate function. That is, we cannot solve our problem in the following way:

SELECT MAX(AVG(price)) max_avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker;

But there are no rules without exceptions. Strange as it may seem, such constructs work in Oracle, and the above query will give the result:

MAX_AVG_PRICE
850

By the way, a solution using a window function will also work in Oracle. I can assume that the solution without a window function actually uses it, implicitly implying the OVER() offer.

You will surely find solutions to such problems based on sorting with a limit on the number of rows in the resulting set. However, such solutions are not legitimate from the point of view of the language standard and, consequently, have different syntax in different implementations.

As an example, I will give the solutions of our task in the SQL Server and MySQL dialects

SQL Server

SELECT TOP 1 AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
ORDER BY avg_price DESC;

MySQL

SELECT AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
ORDER BY avg_price DESC
LIMIT 1;

Both these solutions take only the first line of the sorted down average price set.

Beginners to learn SQL often have the problem of determining the manufacturer for which the required maximum/minimum is reached. In other words, you need to find the maximum average price and the manufacturer, whose average PC price coincides with this maximum average price.

By non-standard means this problem is solved by actually considered above query:

SELECT TOP 1 maker, AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
ORDER BY avg_price DESC;

Using maker in the SELECT proposal column list is quite acceptable, since this column is used to group.

However, there is one “trap” here. This is due to the fact that the maximum can be achieved for several manufacturers, and in this task statement, they should display all, while we limit the selection to only one (first) row. In this case, the dialect T-SQL has an additional design WITH TIES.

Logically, the right solution will look like one:

SELECT TOP 1 WITH TIES maker, AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
ORDER BY avg_price DESC;

However, if you bear in mind the code portability problem, you should prefer a solution that uses standard means.

SQL Aggregate Functions – SUM, AVG(n), COUNT, MIN, MAX Functions

Tags: SQL, SQL Aggregate Functions, sql server, sqls, T-SQL

NoSQL life without a schema

NoSQL: life without a schema

PreambleNoSql is not a replacement for SQL databases but is a valid alternative for many situations where standard SQL is not the best approach for...

SQLS*Plus - MongoDB Sample operators 1

MongoDB Sample operators

PreambleMongoDB Conditional operators specify a condition to which the value of the document field shall correspond.Comparison Query Operators $eq...

5 Database management trends impacting database administration

5 Database management trends impacting database administrationIn the realm of database management systems, moreover half (52%) of your competitors feel...

SQL Server data types

SQL Server data types

The data type is defined as the type of data that any column or variable can store in MS SQL Server. What is the data type? When you create any table or...

SQL Server Architecture

SQL Server Architecture

PreambleMS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a query.SQL Server accepts,...

Get a better understanding of the MongoDB master slave configuration

Get a better understanding of the MongoDB master slave configuration

First the basics: what is the master/slave?One database server (“master”) responds and can do anything. A lot of other database servers store copies of all...

Run a MongoDB data adapter for Atom Hopper with Netbeans

Run a MongoDB data adapter for Atom Hopper with Netbeans

PreambleAtom Hopper (based on Apache Abdera) for those who may not know is an open-source project sponsored by Rackspace. Today we will figure out how to...

Improve MongoDB aggregation structure

Improve MongoDB aggregation structure

PreambleMongoDB recently introduced its new aggregation structure. This structure provides a simpler solution for calculating aggregated values rather...

Passing MongoDB data modeling

Passing MongoDB data modeling

FlexibilityOne of the most advertised features of MongoDB is its flexibility. Flexibility, however, is a double-edged sword. More flexibility means more...

SQLShell: SQL tool for multiple databases with NoSQL potential

SQLShell: SQL tool for multiple databases with NoSQL potential

PreambleSQLShell is a cross-platform command-line tool for SQL, similar to psql for PostgreSQL or MySQL command-line tool for MySQL.Why use it?If you...

Use Mongosniff to clarify what your MongoDB hears and says

Use Mongosniff to clarify what your MongoDB hears and says

PreambleWriting an application on top of the framework on top of the driver on top of the database is a bit like a game on the phone: you say “insert...

Notes on Oracle Coherence

Notes on Oracle Coherence

PreambleOracle Coherence is a distributed cache that is functionally comparable with Memcached. In addition to the basic function of the API cache, it...

FrankenQueries: when SQL and NoSQL collide

FrankenQueries: when SQL and NoSQL collide

PreambleIBM pureXML, a proprietary XML database built on a relational mechanism (designed for puns) that offers both relational ( SQL / XML ) and...

PostgreSQL array

PostgreSQL array

What is PostgreSQL array? In PostgreSQL we can define a column as an array of valid data types. The data type can be built-in, custom or enumerated....

7 steps to create a new Oracle database from the command line

7 steps to create a new Oracle database from the command line

PreambleIf you are a Linux sysadmin or developer, there comes a time when you need to manage an Oracle database that can work in your environment.In this...

Restoring access to the SQL Server instance without restarting it

Restoring access to the SQL Server instance without restarting

PreambleStarting with Microsoft SQL Server 2008, by default, the group of local administrators is no longer added to SQL Server administrators during the...