SoFunction
Updated on 2025-04-08

Postgresql to view the efficiency of SQL statement execution

The Explain command is the first recommended command to solve database performance. Most performance problems can be simply solved through this command. Explain can be used to view the execution effect of SQL statements, which can help select better indexes and optimize query statements, and write better optimization statements.

Explain syntax:

explain select … from … [where ...]

For example:

explain select * from dual;

Here is a simple example, as follows:

EXPLAIN SELECT * FROM tenk1;
               QUERY PLAN
----------------------------------------------------------------
   Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)

The data referenced by EXPLAIN is:

1). The estimated startup overhead (time consumed before the output scan starts, such as the time to schedule in a sorting node).

2). The estimated total overhead.

3). The estimated number of rows output by the planning node.

4). The estimated average line width (units: bytes) of the planned node.

The calculation unit of overhead here is the number of accesses to disk pages. For example, 1.0 will represent the order of disk page readings in one order. The overhead of the upper node will include the overhead of all its child nodes. The output rows here are not the number of rows processed/scanned by the planning node, and are usually smaller. Generally speaking, the expected number of rows at the top level will be closer to the number of rows actually returned by the query.

Now we execute the following query based on the system table:

SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';

From the query results, we can see that the tenk1 table occupies 358 disk pages and 10,000 records. However, in order to calculate the cost value, we still need to know another system parameter value.

postgres=# show cpu_tuple_cost;
   cpu_tuple_cost
  ----------------
   0.01
  (1 row)
cost = 458(Number of disk pages) + 10000(Number of rows) * 0.01(cpu_tuple_costSystem parameter values

Supplement: postgresql SQL COUNT(DISTNCT FIELD) optimization

background

Statistics all the total number of keywords in a certain period of time, and also contains null (statistics has 400w+ data, the table size is 600M), so

Write out sql:

select count(distinct keyword) +1 as count from statistics;

question

Although it is a background query, it is too slow and the execution time is 38.6s. How to optimize it?

solve

Method 1 (Treatment of symptoms)

Execute this regularly, then cache the SQL results, and then the program accesses the cache results. The page access is faster, but in essence, the problem of slow SQL execution has not been solved.

Method 2 (Remedy for the root cause)

To optimize SQL, first of all, let’s talk about why count(distinct FIELD) is so slow. I won’t go into details here, please read this article:https:///article/

Optimized content:

select count( distinct FIELD ) from table

Modified to

select count(1) from (select distinct FIELD from table) as foo;

Compare

Execute process comparison, you can use the explore anaylze sql statement to view it

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.