Detailed explanation of the method of PostgreSQL to implement pivot table query

Requirement description

The easiest way to explain how this function works is to use an example with a pivot table. First, we will explain our initial point of view from a practical perspective and then define the required pivot table.

Assuming we are teachers, we need to count the grades (language, music, etc.) of all subjects you teach, and the school provides you with a system to record all evaluation or test results. The following SQL statement will display the calculation results loaded into the system:

SELECT *
FROM evaluations

The sample data is as follows:

Student	Subject	Evaluation_result	Evaluation_day
Smith, John	Music	7.0	2016-03-01
Smith, John	Maths	4.0	2016-03-01
Smith, John	History	9.0	2016-03-22
Smith, John	Language	7.0	2016-03-15
Smith, John	Geography	9.0	2016-03-04
Gabriel, Peter	Music	2.0	2016-03-01
Gabriel, Peter	Maths	10.0	2016-03-01
Gabriel, Peter	History	7.0	2016-03-22
Gabriel, Peter	Language	4.0	2016-03-15
Gabriel, Peter	Geography	10.0	2016-03-04

Expected results

The following table can easily track student progress. In computer science, we call this mesh a pivot table. If you analyze the pivot table, you will find that we use the values from the original data as the column title or field name (in this case geography, history, mathematics, etc.).

The desired data format is as follows:

Student	Geography	History	Language	Maths	Music
Gabriel, Peter	10.0	7.0	4.0	10.0	2.0
Smith, John	9.0	9.0	7.0	4.0	7.0

Enable tablefunc extension

As we mentioned earlier,crosstabFunctions are part of PostgreSQL extension tablefunc. To callcrosstabFunctions must first enable the tablefunction extension by executing the following SQL command:

CREATE extension tablefunc;

crossstab function

crosstabThe function receives SQL SELECT command as a parameter, and this parameter must comply with the following restrictions:

SELECT must return 3 columns.
The first column in SELECT will be the identifier of each row in the pivot table or the final result. In our case, this is the student’s name. Notice how the student's name (John Smith and Peter Gabriel) appears in the first column.
The second column in SELECT represents the category in the pivot table. In our case, these categories are subjects in the school. It should be noted that the value of this column will be extended to many columns in the PivotTable. If the second column returns 5 different values (geography, history, etc.), the pivot table will have 5 columns.
The third column in SELECT represents the value of each cell assigned to the pivot table. These are the evaluation results in our example.

If we think of a pivot table as a two-dimensional array, then the first SELECT column is the first dimension of the array, the second SELECT column is the second dimension, and the third is the value of the array element. For example grid [first_column_value, second_column_value] = third_column_value.

SELECT student, subject, evaluation_result FROM evaluations ORDER BY 1,2

crosstabThe function is called in the FROM clause of the SELECT statement. We must define the name of the column and data type that will enter the final result. For our purposes, the final result is defined as:

AS final_result(Student TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC)

Integrate the above content, complete statements:

SELECT *
FROM crosstab( 'select student, subject, evaluation_result from evaluations order by 1,2')
     AS final_result(Student TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC);

The query results are as follows:

Student	Geography	History	Language	Maths	Music
Gabriel, Peter	10.0	7.0	4.0	10.0	2.0
Smith, John	9.0	9.0	7.0	4.0	7.0

Pivot Table Practical Example

From a single dataset, we can generate many different pivot tables. Let's continue to take teachers and classes as examples and see some of our options.

Check students' monthly scores

As teachers, we may also need a report on the results of the students’ assessment so far this year. For example, suppose we want to get the average rating of John Smith from March to July. In the following grid, the table looks like this:

month text	geography numeric	history numeric	language numeric	maths numeric	music numeric
3	9.00	9.00	7.00	4.00	7.00
4	4.00	7.50	7.00	4.00	5.66
5	8.00	6.00	7.00	7.00	7.00
6	7.50	7.00	7.00	7.00	8.00
7	6.66	9.00	7.75	10.00	6.00

Implement pivot table SQL:

SELECT *
FROM crosstab( 'select extract(month from period)::text, ,
             trunc(avg(evaluation_result),2)
     from evaluation, subject 
     where evaluation.subject_id = subject.subject_id and student_id = 1
     group by 1,2 order by 1,2')  AS final_result(Month TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC);

Processing incomplete records

We can also call this section “The limitation of cross-tab and how to resolve it”. Before discussing this issue, let's set the scenario:

Suppose you want to see if some students don’t have test scores in certain subjects. Maybe you can try the previous query and add a WHERE clause to July. The code looks like this:

SELECT *
FROM crosstab( 'select student, subject, evaluation_result from evaluations where extract (month from evaluation_day) = 7 order by 1,2')
     AS final_result(Student TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC);

The following pivot table is the result of the query. We can see very quickly that we did not rate Peter's language, mathematics, and music.

Student	Geography	History	Language	Maths	Music
Gabriel, Peter	10.0	6.0			7.00
Smith, John	6.0	8.0	6.0	9.0	4.0

But if we try regular queries to get Peter's score in July...

SELECT * from evaluations
where extract ( month from evaluation_day)=7 and student like 'Gabriel, Peter'

Of course, the second query is correct because it shows the original data. The problem is that some types of information are missing during the pivot table construction process. To solve this problem, we can use a crosstab function with a second parameter that represents the complete list of categories. If there are missing values, the pivot table will still be built correctly.

The second parameter content: ‘select name from subject order by 1’ , the complete statement is as follows:

SELECT *
FROM crosstab( 'select student, subject, evaluation_result from evaluations
                where extract (month from evaluation_day) = 7 order by 1,2',
                'select name from subject order by 1')
     AS final_result(Student TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC);

Now the output results include missing subjects and use – to represent:

Student	Geography	History	Language	Maths	Music
Gabriel, Peter	10.0	–	6.0	–	–
Smith, John	6.0	8.0	6.0	9.0	4.0

Practice data

PivotTables provide us with a different way to view data. Additionally, we can use the crosstab function to create different pivot tables based on the same raw data. Try to build a pivot table that displays the maximum temperature for each city and month based on the original data in the table below.

CREATE TABLE weather (city text, when timestamp, temperature float);

City	When	Temperature
Miami	2016-01-01 08:00:00	68.6
Miami	2016-01-21 08:00:00	73.3
Orlando	2016-01-01 08:00:00	72.5
Miami	2016-02-01 18:00:00	58.6
Orlando	2016-02-02 18:00:00	62.5
Miami	2016-03-03 08:00:00	55.6
Orlando	2016-03-03 08:00:00	56.7
Miami	2016-04-04 18:00:00	50.6
Orlando	2016-04-04 18:00:00	61.5

The pivot table should have one row for each city and one column for each month. If you prefer, consider making other pivot tables with the same data. Roll up your sleeves and give it a try.

This is the end of this article about the detailed explanation of PostgreSQL's method of implementing pivot table query. For more related PostgreSQL pivot table query content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!