REQUEST COMPLIMENTARY SQLS*PLUS LICENCE
SQL Aggregate Functions
SQL Aggregate functions exist in order to be able to generalize the obtained data in any way, i.e. to manipulate them as we want.
These functions are performed using keywords that are included in the SELECT query, and how they are spelled out will be explained later. To make it clear, here are some of the features of aggregate functions in SQL:
- Summarize selected values
- Find the arithmetic mean
- Find the minimum and maximum of the values
Examples of SQL aggregate functions
We will analyze the most frequently used functions and give some examples.
SUM function
This function allows you to sum up the values of a field when requesting SELECT. It is a fairly useful function whose syntax is quite simple, like all other aggregate functions in SQL. For understanding we will start with an example at once:
Get the sum of all orders from the table of Orders, which were made in 2016.
You could just output the sum of the orders, but I think it is quite simple. Let us remind you of the structure of our table:
onum | amt | odate | cnum | snum |
---|---|---|---|---|
1001 | 128 | 2016-01-01 | 9 | 4 |
1002 | 1800 | 2016-04-10 | 10 | 7 |
1003 | 348 | 2017-04-08 | 2 | 1 |
1004 | 500 | 2016-06-07 | 3 | 3 |
1005 | 499 | 2017-12-04 | 5 | 4 |
1006 | 320 | 2016-03-03 | 5 | 4 |
1007 | 80 | 2017-09-02 | 7 | 1 |
1008 | 780 | 2016-03-07 | 1 | 3 |
1009 | 560 | 2017-10-07 | 3 | 7 |
1010 | 900 | 2016-01-08 | 6 | 8 |
The next code will make the necessary sampling:
SELECT SUM(amt)
FROM Orders
WHERE odate BETWEEN '2016-01-01' and '2016-12-31';
We’ll get the result:
SUM(amt) |
---|
4428 |
In this query we used the SUM function, after which we need to specify the summation field in brackets. Then we specified a condition in WHERE, which selected only rows from 2016. Actually this condition can be written in a different way, but now more important is the aggregate sum function in SQL.
AVG function
The following function calculates the average arithmetic data field, which we will specify as a parameter. The syntax of this function is identical to that of the summation function. Therefore, we will immediately move on to the simplest task:
Print the average cost of the order from the Orders table.
And let’s go straight to the query:
SELECT AVG(amt)
FROM Orders;
We’ll get the result:
AVG(amt) |
---|
591.5 |
In general, everything is similar to the previous function. And the syntax is simple enough. This is the peculiarity of SQL language – to be understandable to humans.
Functions MIN and MAX
Two more functions that are close in action. They find the minimum or maximum value respectively of the parameter to be passed in parentheses. The syntax is repeated and therefore the following example:
Print maximum and minimum order price values for those orders where the price is less than 1000.
We get such a request,
SELECT MAX(amt), MIN(amt)
FROM Orders
WHERE amt < 1000;
which will lead out:
MAX(amt) | MIN(amt) |
---|---|
900 | 80 |
It should also be noted that unlike the previous functions, these 2 can work with symbolic parameters, that is, you can write a query of the MIN(odate) type (in this case, the date is symbolic), and then we will return 2016-01-01.
The point is that these functions have a mechanism for converting characters into ASCII code, which they then compare.
Another important point is that we can perform some simple mathematical operations in a SELECT query, for example, such a query:
SELECT (MAX(amt) - MIN(amt)) AS 'Difference'.
FROM Orders;
Will return that answer:
Difference |
---|
1720 |
COUNT function
This function is necessary to calculate the number of selected values or rows. There are two main ways to use it:
- With the DISTINCT keyword, to calculate the number of non-repeating values
- Using “*” to calculate the number of all selected values
Now let’s look at an example of using COUNT in SQL:
Calculate the number of orders made and the number of sellers in the Orders table.
SELECT COUNT(*), COUNT(DISTINCT snum).
FROM Orders;
We got it:
COUNT(*) | COUNT(snum) |
---|---|
10 | 5 |
Obviously, the number of orders is 10, but if you have a large table, this function will be very convenient. As for unique sellers, it is necessary to use DISTINCT, because one seller can serve several orders.
GROUP BY operator
Now let’s look at 2 important operators that help extend the functionality of our queries in SQL. The first of them is GROUP BY operator, which groups by any field, which is sometimes necessary. And already for this group performs the specified action. For example:
Display the sum of all orders for each seller individually.
That is, now we need for each seller in the table Orders to highlight the fields with the price of the order and sum it up. All this will make the GROUP BY operator in SQL quite easy:
SELECT snum, SUM(amt) AS 'Sum of all orders'.
FROM Orders
GROUP BY snum;
And in the end we’ll get:
snum | Amount of all orders |
---|---|
1 | 428 |
3 | 1280 |
4 | 947 |
7 | 2360 |
8 | 900 |
As you can see, SQL has allocated a group for each seller and calculated the sum of all their orders.
Operator HAVING
This operator is used as an addition to the previous one. It is necessary to set conditions for data sampling during grouping. If the condition is met, the group is selected, if not, nothing will happen. Let’s consider the following code:
SELECT snum, SUM(amt) AS 'Sum of all the orders'.
FROM Orders
GROUP BY snum
HAVING MAX(amt) > 1000;
Who will create a group for the seller and calculate the amount of orders of this group, only if the maximum order amount exceeds 1000. Obviously, there is only one such seller, a group will be created for him and the sum of all orders will be counted:
snum | Amount of all orders |
---|---|
7 | 2360 |
It would seem why not use the WHERE condition here, but SQL is so constructed that in this case it will generate an error, and that is why SQL has the HAVING operator.
Examples of aggregate functions in SQL
1. Write a request that will calculate all amounts of orders completed on January 1, 2016.
SELECT SUM(amt)
FROM Orders
WHERE odate = '2016-01-01';
2. Write a query that counts the number of different, different from NULL, values of the city field in the customer table.
SELECT COUNT(DISTINCT city)
FROM customers;
3. Write a request that would select the lowest amount for each customer.
SELECT cnum, MIN(amt)
FROM orders
GROUP BY cnum;
4. Write a request that selects customers whose names begin with the letter H.
SELECT cname
FROM customers
WHERE cname LIKE 'H%' ;
5. Write a request that would select the highest rating in each city.
SELECT city, MAX(rating)
FROM customers
GROUP BY city;
Let’s consider such a task:
Find the maximum value among average PC prices calculated for each manufacturer separately.
It is not possible to calculate the average cost values for labor producers:
SELECT AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker;
However, the standard prohibits the use of a subquery as an argument for an aggregate function, i.e. the task cannot be solved in the following way:
SELECT MAX(
SELECT AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
);
In such cases a subquery in the FROM offer is used:
SELECT MAX(avg_price)
FROM (SELECT AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
) X;
With the new features of the language – window functions – this task can be solved without any subquery:
SELECT DISTINCT MAX(AVG(price)) OVER () max_avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker;
Note that window functions allow you to use an aggregate function as an argument. The keyword DISTINCT is required here as the maximum value calculated across the entire set of mean values will be “attributed” to each manufacturer.
The standard also prohibits the use of an aggregate function as an argument for another aggregate function. That is, we cannot solve our problem in the following way:
SELECT MAX(AVG(price)) max_avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker;
But there are no rules without exceptions. Strange as it may seem, such constructs work in Oracle, and the above query will give the result:
|
By the way, a solution using a window function will also work in Oracle. I can assume that the solution without a window function actually uses it, implicitly implying the OVER() offer.
You will surely find solutions to such problems based on sorting with a limit on the number of rows in the resulting set. However, such solutions are not legitimate from the point of view of the language standard and, consequently, have different syntax in different implementations.
As an example, I will give the solutions of our task in the SQL Server and MySQL dialects
SQL Server
SELECT TOP 1 AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
ORDER BY avg_price DESC;
MySQL
SELECT AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
ORDER BY avg_price DESC
LIMIT 1;
Both these solutions take only the first line of the sorted down average price set.
Beginners to learn SQL often have the problem of determining the manufacturer for which the required maximum/minimum is reached. In other words, you need to find the maximum average price and the manufacturer, whose average PC price coincides with this maximum average price.
By non-standard means this problem is solved by actually considered above query:
SELECT TOP 1 maker, AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
ORDER BY avg_price DESC;
Using maker in the SELECT proposal column list is quite acceptable, since this column is used to group.
However, there is one “trap” here. This is due to the fact that the maximum can be achieved for several manufacturers, and in this task statement, they should display all, while we limit the selection to only one (first) row. In this case, the dialect T-SQL has an additional design WITH TIES.
Logically, the right solution will look like one:
SELECT TOP 1 WITH TIES maker, AVG(price) avg_price
FROM Product P JOIN PC ON P.model = PC.model
GROUP BY maker
ORDER BY avg_price DESC;
However, if you bear in mind the code portability problem, you should prefer a solution that uses standard means.
SQL Aggregate Functions – SUM, AVG(n), COUNT, MIN, MAX Functions
MORE NEWS
PreambleNoSql is not a replacement for SQL databases but is a valid alternative for many situations where standard SQL is not the best approach for...
PreambleMongoDB Conditional operators specify a condition to which the value of the document field shall correspond.Comparison Query Operators $eq...
5 Database management trends impacting database administrationIn the realm of database management systems, moreover half (52%) of your competitors feel...
The data type is defined as the type of data that any column or variable can store in MS SQL Server. What is the data type? When you create any table or...
PreambleMS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a query.SQL Server accepts,...
First the basics: what is the master/slave?One database server (“master”) responds and can do anything. A lot of other database servers store copies of all...
PreambleAtom Hopper (based on Apache Abdera) for those who may not know is an open-source project sponsored by Rackspace. Today we will figure out how to...
PreambleMongoDB recently introduced its new aggregation structure. This structure provides a simpler solution for calculating aggregated values rather...
FlexibilityOne of the most advertised features of MongoDB is its flexibility. Flexibility, however, is a double-edged sword. More flexibility means more...
PreambleSQLShell is a cross-platform command-line tool for SQL, similar to psql for PostgreSQL or MySQL command-line tool for MySQL.Why use it?If you...
PreambleWriting an application on top of the framework on top of the driver on top of the database is a bit like a game on the phone: you say “insert...
PreambleOracle Coherence is a distributed cache that is functionally comparable with Memcached. In addition to the basic function of the API cache, it...
PreambleIBM pureXML, a proprietary XML database built on a relational mechanism (designed for puns) that offers both relational ( SQL / XML ) and...
What is PostgreSQL array? In PostgreSQL we can define a column as an array of valid data types. The data type can be built-in, custom or enumerated....
PreambleIf you are a Linux sysadmin or developer, there comes a time when you need to manage an Oracle database that can work in your environment.In this...
PreambleStarting with Microsoft SQL Server 2008, by default, the group of local administrators is no longer added to SQL Server administrators during the...