SQL Server - Grouping data using ROLLUP, CUBE and GROUPING SETS

Hey guys,
Good morning!

In this post I will talk about a subject that is nothing new in SQL Server, it has been present since SQL Server 2008, but I don't see many people using it in their queries, which is data grouping (summarization) using ROLLUP, CUBE and GROUPING SETS.

This type of resource is especially useful for generating totals and subtotals without having to create multiple subqueries, as it allows you to perform this task in a single command, as I will demonstrate below.

To create an AutoSum feature in your dataset, just like Excel implements it, see more in the post SQL Server – How to create an AutoSum (same as Excel) using Window functions

Test Base

For the examples in this post, I will provide the base below so that we can create a testing and demonstration environment.

IF (OBJECT_ID('tempdb..#Produtos') IS NOT NULL) DROP TABLE #Produtos
CREATE TABLE #Produtos (
    Codigo INT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
    Ds_Produto VARCHAR(50) NOT NULL,
    Ds_Categoria VARCHAR(50) NOT NULL,
    Preco NUMERIC(18, 2) NOT NULL
)

IF (OBJECT_ID('tempdb..#Vendas') IS NOT NULL) DROP TABLE #Vendas
CREATE TABLE #Vendas (
    Codigo INT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
    Dt_Venda DATETIME NOT NULL,
    Cd_Produto INT NOT NULL
)


INSERT INTO #Produtos ( Ds_Produto, Ds_Categoria, Preco )
VALUES
    ( 'Processador i7', 'Informática', 1500.00 ),
    ( 'Processador i5', 'Informática', 1000.00 ),
    ( 'Processador i3', 'Informática', 500.00 ),
    ( 'Placa de Vídeo Nvidia', 'Informática', 2000.00 ),
    ( 'Placa de Vídeo Radeon', 'Informática', 1500.00 ),
    ( 'Celular Apple', 'Celulares', 10000.00 ),
    ( 'Celular Samsung', 'Celulares', 2500.00 ),
    ( 'Celular Sony', 'Celulares', 4200.00 ),
    ( 'Celular LG', 'Celulares', 1000.00 ),
    ( 'Cama', 'Utilidades do Lar', 2000.00 ),
    ( 'Toalha', 'Utilidades do Lar', 40.00 ),
    ( 'Lençol', 'Utilidades do Lar', 60.00 ),
    ( 'Cadeira', 'Utilidades do Lar', 200.00 ),
    ( 'Mesa', 'Utilidades do Lar', 1000.00 ),
    ( 'Talheres', 'Utilidades do Lar', 50.00 )

    

DECLARE @Contador INT = 1, @Total INT = 100

WHILE(@Contador <= @Total)
BEGIN

    INSERT INTO #Vendas ( Cd_Produto, Dt_Venda )
    SELECT 
        (SELECT TOP 1 Codigo FROM #Produtos ORDER BY NEWID()) AS Cd_Produto,
        DATEADD(DAY, (CAST(RAND() * 364 AS INT)), '2017-01-01') AS Dt_Venda

    SET @Contador += 1

END

In a simple data grouping, just bringing the quantities filtering by category and product, we were able to return this view using the query below:

SELECT 
    B.Ds_Categoria,
    B.Ds_Produto,
    COUNT(*) AS Qt_Vendas,
    SUM(B.Preco) AS Vl_Total
FROM 
    #Vendas A
    JOIN #Produtos B ON A.Cd_Produto = B.Codigo
GROUP BY
    B.Ds_Categoria,
    B.Ds_Produto
ORDER BY
    1, 2

Summarizing the values

But what if you need to group the data creating a totalizer for each category and at the end a general totalizer? How would you do it?

Using UNION ALL

A traditional way of meeting this need is using subquery with UNION ALL, so that you return 3 resultsets containing the analytical data, the totalizers by category and then the general sum. The greater the number of filters, the more complex your subquery will be, as it will need more queries at each level.

SELECT 
    *
FROM (

    SELECT 
        B.Ds_Categoria,
        B.Ds_Produto,
        COUNT(*) AS Qt_Vendas,
        SUM(B.Preco) AS Vl_Total
    FROM 
        #Vendas	A
        JOIN #Produtos B ON A.Cd_Produto = B.Codigo
    GROUP BY
        B.Ds_Categoria,
        B.Ds_Produto
    
    UNION ALL

    SELECT 
        B.Ds_Categoria,
        'Subtotal' AS Ds_Produto,
        COUNT(*) AS Qt_Vendas,
        SUM(B.Preco) AS Vl_Total
    FROM 
        #Vendas	A
        JOIN #Produtos B ON A.Cd_Produto = B.Codigo
    GROUP BY
        B.Ds_Categoria
    
    UNION ALL

    SELECT 
        'Total' AS Ds_Categoria,
        'Total' AS Ds_Produto,
        COUNT(*) AS Qt_Vendas,
        SUM(B.Preco) AS Vl_Total
    FROM 
        #Vendas	A
        JOIN #Produtos B ON A.Cd_Produto = B.Codigo
    
) A
ORDER BY
    (CASE WHEN A.Ds_Categoria = 'Total' THEN 1 ELSE 0 END),
    A.Ds_Categoria,
    (CASE WHEN A.Ds_Produto = 'Subtotal' THEN 1 ELSE 0 END),
    A.Ds_Produto

Using GROUP BY ROLLUP

A very simple and practical way to solve this problem is by using the ROLLUP() function in GROUP BY, which already creates groupings and summaries according to the columns grouped in the function.

Using this function, you will see that it creates the totalizers just below each grouping and the general totalizer in the last line of the resultset.

Example 1:

SELECT 
    B.Ds_Categoria,
    B.Ds_Produto,
    COUNT(*) AS Qt_Vendas,
    SUM(B.Preco) AS Vl_Total
FROM 
    #Vendas A
    JOIN #Produtos B ON A.Cd_Produto = B.Codigo
GROUP BY
    ROLLUP(B.Ds_Categoria, B.Ds_Produto)

As you may have noticed, the lines containing the summaries have a NULL value in the grouped columns. In this example, the subtotal by category has the Ds_Produto column with a NULL value and in the general total, both columns have a NULL value. For our query to produce the same result as another query with UNION ALL, we simply need to treat these NULL values, as shown below:

SELECT 
    ISNULL(B.Ds_Categoria, 'Total') AS Ds_Categoria,
    ISNULL(B.Ds_Produto, 'Subtotal') AS Ds_Produto,
    COUNT(*) AS Qt_Vendas,
    SUM(B.Preco) AS Vl_Total
FROM 
    #Vendas A
    JOIN #Produtos B ON A.Cd_Produto = B.Codigo
GROUP BY
    ROLLUP(B.Ds_Categoria, B.Ds_Produto)

Example 2:

SELECT
    ISNULL(CONVERT(VARCHAR(10), MONTH(A.Dt_Venda)), 'Total') AS Mes_Venda, 
    ISNULL(B.Ds_Categoria, 'Subtotal') AS Ds_Categoria,
    COUNT(*) AS Qt_Vendas,
    SUM(B.Preco) AS Vl_Total
FROM 
    #Vendas A
    JOIN #Produtos B ON A.Cd_Produto = B.Codigo
GROUP BY
    ROLLUP(MONTH(A.Dt_Venda), B.Ds_Categoria)

Using GROUP BY CUBE

Just like the ROLLUP() function, the CUBE() function allows you to create grouped totalizers in a very practical and easy way and its syntax is the same as the ROLLUP function.

Using this function, you will see that the difference between it and ROLLUP is that the CUBE function creates totalizers for each type of grouping.

Example:

SELECT
    ISNULL(CONVERT(VARCHAR(10), MONTH(A.Dt_Venda)), 'Total') AS Mes_Venda, 
    ISNULL(B.Ds_Categoria, 'Subtotal') AS Ds_Categoria,
    COUNT(*) AS Qt_Vendas,
    SUM(B.Preco) AS Vl_Total
FROM 
    #Vendas A
    JOIN #Produtos B ON A.Cd_Produto = B.Codigo
GROUP BY
    CUBE(MONTH(A.Dt_Venda), B.Ds_Categoria)

See that using the CUBE() function, I can obtain the general totalizer, the totalizer for the Month_Venda column and the totalizer for the Category column. In other words, in the CUBE function, it generates all the combination possibilities between the columns used in the function.

Using GROUP BY GROUPING SETS

Using the GROUPING SETS function in GROUP BY allows us to generate totalizers for our data using the columns inserted in the function, so that it generates different totalizers in a single query. Unlike the ROLLUP and CUBE functions, the GROUPING SETS function does not return a general totalizer.

Example 1:

SELECT
    ISNULL(B.Ds_Produto, 'Total') AS Ds_Produto, 
    ISNULL(B.Ds_Categoria, 'Subtotal') AS Ds_Categoria,
    COUNT(*) AS Qt_Vendas,
    SUM(B.Preco) AS Vl_Total
FROM 
    #Vendas A
    JOIN #Produtos B ON A.Cd_Produto = B.Codigo
GROUP BY
    GROUPING SETS(B.Ds_Categoria, B.Ds_Produto)

In this example, the GROUPING SETS function returns the totalizer by product and then the totalizer by category, where you would need to execute 2 queries to show this same view.

Example 2:

SELECT
    MONTH(A.Dt_Venda) AS Mes_Venda, 
    B.Ds_Categoria,
    B.Ds_Produto,
    COUNT(*) AS Qt_Vendas,
    SUM(B.Preco) AS Vl_Total
FROM 
    #Vendas A
    JOIN #Produtos B ON A.Cd_Produto = B.Codigo
GROUP BY
    GROUPING SETS(MONTH(A.Dt_Venda), B.Ds_Categoria, B.Ds_Produto)

In this example, the GROUPING SETS function returned the sum of the values grouped by the 3 columns that I used in the function: Sum by Sales_Month, Sum by Product and Sum by Category. To reproduce this same result, I would need to create a query with 3 queries. If there were 10 columns, the query would need 10 different queries to bring this result, which we achieved with a single query using the GROUPING SETS function.

Note: All 3 functions presented in this post accept N columns as parameters, not being limited to just 2 columns.

Performance and Performance

Now that we've seen how the 3 functions work and seen how practical their use is, saving many lines of code, let's analyze whether they are performant or not.

I will try to use the same query for all cases, although in some, the result (output) is different, as it really has different objectives, but just to demonstrate the cost and execution plan of each one.

UNION ALL (Example Query):
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(19 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#Products’. Scan count 2, logical reads 204, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#Sales’. Scan count 3, logical reads 64, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 79 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

GROUP BY ROLLUP (Query from example 1)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

(19 row(s) affected)
Table ‘#Sales’. Scan count 1, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#Products’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 57 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

GROUP BY CUBE (Query from example 1)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.

(49 row(s) affected)
Table ‘Worktable’. Scan count 2, logical reads 71, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#Products’. Scan count 0, logical reads 200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#Sales’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 210 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

GROUP BY GROUPING SETS (query from example 1)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 5 ms.

(18 row(s) affected)
Table ‘Worktable’. Scan count 2, logical reads 35, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#Sales’. Scan count 1, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘#Products’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 58 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

Running the 4 queries together:
SQL Sentry Plan Explorer Execution Plan

SQL Server Management Studio Execution Plan

In other words, with these tests and using this mass of data, we can say that queries performed with these functions are more performant than using subqueries with several queries to return grouped data. This does not mean that this statement will always be true, it will depend a lot on your environment and your mass of data.

I hope you liked this post.
A hug!

Dirceu Resende

Database & BI Architect · Microsoft MVP · MCSE, MCSA, MCT, MTA, MCP.

WhatsApp Telegram View my certifications

Comentários (0)

Carregando comentários…