Detailed explanation of three methods to implement pivot tables in PostgreSQL

Pivot Table is a powerful tool for data summary, analysis, browsing and presentation. It can help us understand comparisons, patterns and trends in our data. It is one of the essential skills of data analysts and operations personnel. Today we will talk about how to implement the function of pivot tables in PostgreSQL.

The sample data used in this article canClick here to download。

Implementing Pivot Tables with CASE Expressions

A common way to implement data row-to-column conversion is to use CASE conditional expressions and grouping aggregation operations. First, use the following GROUP BY clause to classify and summarize sales data:

select coalesce(product, '【All Products】') "product",
       coalesce(channel, '【All channels】') "channel",
       coalesce(to_char(saledate, 'YYYYMM'), '【All Months】') "month",
       sum(amount) "Sales"
from sales_data
group by rollup (product,channel, to_char(saledate, 'YYYYMM'))
order by product, channel, to_char(saledate, 'YYYYMM');

Among them, group by summarizes the data according to products, channels, and months; the rollup option is used to generate subtotals, totals and totals at different levels; the coalesce function is used to display the NULL values in the summary row as corresponding information. The result returned by this query is as follows:

product       |channel       |month       |Sales    |
----------|-----------|----------|-------|
orange       |Storefront       |201901    |  41306|
orange       |Storefront       |201902    |  37906|
orange       |Storefront       |201903    |  48866|
orange       |Storefront       |201904    |  48673|
orange       |Storefront       |201905    |  58998|
orange       |Storefront       |201906    |  58931|
orange       |Storefront       |【所有month】| 294680|
orange       |JD.com       |201901    |  41289|
orange       |JD.com       |201902    |  43913|
orange       |JD.com       |201903    |  49803|
orange       |JD.com       |201904    |  49256|
orange       |JD.com       |201905    |  64889|
orange       |JD.com       |201906    |  62649|
orange       |JD.com       |【所有month】| 311799|
orange       |Taobao       |201901    |  43488|
orange       |Taobao       |201902    |  37598|
orange       |Taobao       |201903    |  48621|
orange       |Taobao       |201904    |  49919|
orange       |Taobao       |201905    |  58530|
orange       |Taobao       |201906    |  64626|
orange       |Taobao       |【所有month】| 302782|
orange       |【所有channel】|【所有month】| 909261|
...
banana       |【所有channel】|【所有month】| 925369|
【全部product】|【所有channel】|【所有month】|2771682|

Next, we display the data into different columns according to different months, that is, convert the rows into columns. This function can be implemented using a CASE expression:

select coalesce(product, '【All Products】') "product", coalesce(channel, '【All channels】') "channel", 
       sum(case to_char(saledate, 'YYYYMM') when '201901' then amount else 0 end) "January",
       sum(case to_char(saledate, 'YYYYMM') when '201902' then amount else 0 end) "February",
       sum(case to_char(saledate, 'YYYYMM') when '201903' then amount else 0 end) "March",
       sum(case to_char(saledate, 'YYYYMM') when '201904' then amount else 0 end) "April",
       sum(case to_char(saledate, 'YYYYMM') when '201905' then amount else 0 end) "May",
       sum(case to_char(saledate, 'YYYYMM') when '201906' then amount else 0 end) "June",
       sum(amount) "total"
from sales_data
group by rollup (product, channel);

product       |channel       |January   |February  |March   |April   |May  |June   |total    |
----------|-----------|------|------|------|------|------|------|-------|
orange       |Storefront       | 41306| 37906| 48866| 48673| 58998| 58931| 294680|
orange       |JD.com       | 41289| 43913| 49803| 49256| 64889| 62649| 311799|
orange       |Taobao       | 43488| 37598| 48621| 49919| 58530| 64626| 302782|
orange       |【所有channel】|126083|119417|147290|147848|182417|186206| 909261|
apple       |Storefront       | 43845| 40539| 44909| 55646| 56771| 64933| 306643|
apple       |JD.com       | 38269| 40593| 56552| 56662| 64493| 62045| 318614|
apple       |Taobao       | 42969| 43289| 48769| 58052| 58872| 59844| 311795|
apple       |【所有channel】|125083|124421|150230|170360|180136|186822| 937052|
banana       |Storefront       | 41210| 39420| 50884| 52085| 60249| 67597| 311445|
banana       |JD.com       | 36879| 36981| 51748| 54801| 64936| 60688| 306033|
banana       |Taobao       | 42468| 41955| 52780| 54971| 56504| 59213| 307891|
banana       |【所有channel】|120557|118356|155412|161857|181689|187498| 925369|
【全部product】|【所有channel】|371723|362194|452932|480065|544242|560526|2771682|

The CASE expression in the first SUM function only summarizes the sales volume in 201901, and the sales volume in other months is set to 0; the subsequent SUM function and so on, obtains the sales volume summary for each month and the total for all months.

The method of using CASE conditional expression plus grouping aggregation is also applicable to other databases.

Implementing a Pivot Table with the FILTER clause

In addition to using CASE expressions in aggregate functions, PostgreSQL also provides an easier method, which is the FILTER clause.

aggregate_function(<expression>) FILTER (WHERE <condition>)

Among them, aggregate_function can be any aggregate function or window function; the FILTER clause is used to specify an additional condition, and only data rows that meet this condition will participate in the calculation.

We can use the FILTER clause of the SUM function to achieve the same effect as the CASE expression above:

select coalesce(product, '【All Products】') "product", coalesce(channel, '【All channels】') "channel", 
       sum(amount) filter(where to_char(saledate, 'YYYYMM') = '201901') "January",
       sum(amount) filter(where to_char(saledate, 'YYYYMM') = '201901') "February",
       sum(amount) filter(where to_char(saledate, 'YYYYMM') = '201901') "March",
       sum(amount) filter(where to_char(saledate, 'YYYYMM') = '201901') "April",
       sum(amount) filter(where to_char(saledate, 'YYYYMM') = '201901') "May",
       sum(amount) filter(where to_char(saledate, 'YYYYMM') = '201901') "June",
       sum(amount) "total"
from sales_data
group by rollup (product, channel);

product       |channel       |January   |February  |March   |April   |May  |June   |total    |
----------|-----------|------|------|------|------|------|------|-------|
orange       |Storefront       | 41306| 37906| 48866| 48673| 58998| 58931| 294680|
orange       |JD.com       | 41289| 43913| 49803| 49256| 64889| 62649| 311799|
orange       |Taobao       | 43488| 37598| 48621| 49919| 58530| 64626| 302782|
orange       |【所有channel】|126083|119417|147290|147848|182417|186206| 909261|
apple       |Storefront       | 43845| 40539| 44909| 55646| 56771| 64933| 306643|
apple       |JD.com       | 38269| 40593| 56552| 56662| 64493| 62045| 318614|
apple       |Taobao       | 42969| 43289| 48769| 58052| 58872| 59844| 311795|
apple       |【所有channel】|125083|124421|150230|170360|180136|186822| 937052|
banana       |Storefront       | 41210| 39420| 50884| 52085| 60249| 67597| 311445|
banana       |JD.com       | 36879| 36981| 51748| 54801| 64936| 60688| 306033|
banana       |Taobao       | 42468| 41955| 52780| 54971| 56504| 59213| 307891|
banana       |【所有channel】|120557|118356|155412|161857|181689|187498| 925369|
【全部product】|【所有channel】|371723|362194|452932|480065|544242|560526|2771682|

PostgreSQL supports the use of subqueries in filter clauses, such as exists.

Create a pivot table using the CROSSTAB function

PostgreSQL's extension module tablefunc provides many functions that return results as data tables, where the crossstab function can be used to implement row-column conversion of data. This is an extension module, so we need to install the plugin first:

create extension if not exists tablefunc;
ERROR:  could not open extension control file "/usr/pgsql-12/share/extension/": No such file or directory

The above error indicates that the postgresql-contrib package is not installed, and we install it through operating system commands (CentOS):

sudo yum install postgresql12-contrib

Then execute the create extension command above to install the tablefunc module.

Next, you can convert rows into columns through the crossstab function, for example:

select *
from crosstab(
$$select product||'-'||channel pc,product,channel, to_char(saledate, 'YYYYMM'), sum(amount)
from sales_data
group by product||'-'||channel, product, channel, to_char(saledate, 'YYYYMM')
order by 1$$,
$$select distinct to_char(saledate, 'YYYYMM') from sales_data order by 1$$
)
as ct(pc text, product text, channel text,
      "201901" numeric, "201902" numeric, "201903" numeric,
      "201904" numeric, "201905" numeric, "201906" numeric);

The crossstab function contains 2 string type parameters, both of which are query statements; the first select statement is used to construct source data, and the source data of crossstab needs to specify a field that identifies each result row, which is used in the example; the result of the second select statement is used to construct the converted fields; the as clause is used to define the field type that returns the result.

pc      |product |channel |201901  |201902  |201903  |201904  |201905  |201906  |
--------|--------|--------|--------|--------|--------|--------|--------|--------|
orange-JD.com|orange     |JD.com     |41289.00|43913.00|49803.00|49256.00|64889.00|62649.00|
orange-Storefront|orange     |Storefront     |41306.00|37906.00|48866.00|48673.00|58998.00|58931.00|
orange-Taobao|orange     |Taobao     |43488.00|37598.00|48621.00|49919.00|58530.00|64626.00|
apple-JD.com|apple     |JD.com     |38269.00|40593.00|56552.00|56662.00|64493.00|62045.00|
apple-Storefront|apple     |Storefront     |43845.00|40539.00|44909.00|55646.00|56771.00|64933.00|
apple-Taobao|apple     |Taobao     |42969.00|43289.00|48769.00|58052.00|58872.00|59844.00|
banana-JD.com|banana     |JD.com     |36879.00|36981.00|51748.00|54801.00|64936.00|60688.00|
banana-Storefront|banana     |Storefront     |41210.00|39420.00|50884.00|52085.00|60249.00|67597.00|
banana-Taobao|banana     |Taobao     |42468.00|41955.00|52780.00|54971.00|56504.00|59213.00|

Next, you need to add a total row and total column:

select coalesce(product, '【All Products】') "product", coalesce(channel, '【All channels】') "channel",
       sum("201901") "January", sum("201902") "February", sum("201903") "March",
       sum("201904") "April", sum("201905") "May", sum("201906") "June",
       sum("201901"+"201902"+"201903"+"201904"+"201905"+"201906") "total"
from crosstab(
$$select product||'-'||channel pc,product,channel, to_char(saledate, 'YYYYMM'), sum(amount)
from sales_data
group by product||'-'||channel, product, channel, to_char(saledate, 'YYYYMM')
order by 1$$,
$$select distinct to_char(saledate, 'YYYYMM') from sales_data order by 1$$
)
as ct(pc text, product text, channel text,
      "201901" numeric, "201902" numeric, "201903" numeric,
      "201904" numeric, "201905" numeric, "201906" numeric)
group by rollup (product, channel);

We added some total data based on the result of the crossstab function and modified the name of the return field to make the result closer to the EXCEL pivot table:

product       |channel       |January   |February  |March   |April   |May  |June   |total    |
----------|-----------|------|------|------|------|------|------|-------|
orange       |JD.com       | 41289| 43913| 49803| 49256| 64889| 62649| 311799|
orange       |Storefront       | 41306| 37906| 48866| 48673| 58998| 58931| 294680|
orange       |Taobao       | 43488| 37598| 48621| 49919| 58530| 64626| 302782|
orange       |【所有channel】|126083|119417|147290|147848|182417|186206| 909261|
apple       |JD.com       | 38269| 40593| 56552| 56662| 64493| 62045| 318614|
apple       |Storefront       | 43845| 40539| 44909| 55646| 56771| 64933| 306643|
apple       |Taobao       | 42969| 43289| 48769| 58052| 58872| 59844| 311795|
apple       |【所有channel】|125083|124421|150230|170360|180136|186822| 937052|
banana       |JD.com       | 36879| 36981| 51748| 54801| 64936| 60688| 306033|
banana       |Storefront       | 41210| 39420| 50884| 52085| 60249| 67597| 311445|
banana       |Taobao       | 42468| 41955| 52780| 54971| 56504| 59213| 307891|
banana       |【所有channel】|120557|118356|155412|161857|181689|187498| 925369|
【全部product】|【所有channel】|371723|362194|452932|480065|544242|560526|2771682|

Summarize

Pivot tables are a very practical data analysis function that can be used to implement complex data classification summary and comparison analysis. This article introduces three ways to implement pivot tables in PostgreSQL, including using a combination of CASE conditional expressions and grouping aggregation, a FILTER clause of the aggregate function, and a crossstab function in the extension module tablefunc to generate a pivot table.

The above is a detailed explanation of the three methods of implementing pivot tables in PostgreSQL. For more information about PostgreSQL pivot tables, please pay attention to my other related articles!