Detailed explanation of the magical role of PostgreSQL limit

Recently, I encountered such a performance problem caused by SQL. The SQL content is roughly as follows:

SELECT *
FROM t1
WHERE id = 999
AND (case $1
    WHEN 'true' THEN
    info = $2
    ELSE info = $3 end) limit 1;

Development reaction: After adding limit 1, a while later, the SQL changed from the original index scan to the full table scan. Why does a simple limit 1 have such an impact? Shouldn't it be faster if I only fetch one data?

Let's start with this SQL.

First, let’s take a look at the table structure. It is relatively simple. There is an index on the info column, as shown below:

bill=# \d t1
                            Table "public.t1"
  Column  |            Type             | Collation | Nullable | Default
----------+-----------------------------+-----------+----------+---------
 id       | integer                     |           |          |
 info     | text                        |           |          |
 crt_time | timestamp without time zone |           |          |
Indexes:
    "idx_t1" btree (info)

And the info column has no duplicate values, which means that no matter what variable is passed in the where condition, it can definitely be scanned through the index. Then why does it become a full table scan after adding limit 1?

Let's first look at the execution plan of the normal indexing of this SQL:

                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.027..0.027 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.025..0.026 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.158 ms
 Execution Time: 0.057 ms
(6 rows)

But the current execution plan is like this:

 Limit  (cost=0.00..0.35 rows=1 width=45) (actual time=487.564..487.564 rows=0 loops=1)
   ->  Seq Scan on t1  (cost=0.00..170895.98 rows=491791 width=45) (actual time=487.562..487.562 rows=0 loops=1)
         Filter: ((id = 999) AND CASE $1 WHEN 'true'::text THEN (info = $2) ELSE (info = $3) END)
         Rows Removed by Filter: 6000000
 Planning Time: 0.119 ms
 Execution Time: 487.595 ms
(6 rows)

What's strange is that the cost is lower after the full table scan and limit, but the actual time is so much longer. When we bring the bound variable values obtained in the log into SQL and then view the execution plan, we still scan the index. Since that is the case, the easier thing to think of is the execution plan error caused by plan cache.

Since the execution plan cache in PostgreSQL is only at the session level, PostgreSQL will go to the custom plan 5 times before generating the execution plan cache, then record the total cost of the 5 custom plan, and the number of custom plan times, and finally generate a general generic plan.

Afterwards, each time you bind, a COST will be calculated based on the cached execution plan and the given parameter value. If this COST is less than the average value of the previously stored custom plan cost, the current cached execution plan will be used. If this COST is greater than the average value of the custom plan cost stored in the previous store, then use custom plan (that is, regenerate the execution plan), and add 1 to the number of custom plan times, and the total cost of custom plan will also be added.

That being the case, we use the prepare statement to test it again:

bill=# prepare p1 as select * from t1 where id = 999
bill-# and (case $1 when 'true' then info = $2 else info = $3 end)  limit 1;
PREPARE
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.831..0.831 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.830..0.830 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.971 ms
 Execution Time: 0.889 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.038..0.039 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.037..0.037 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.240 ms
 Execution Time: 0.088 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.036..0.036 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.035..0.035 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.136 ms
 Execution Time: 0.076 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.051..0.051 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.049..0.050 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.165 ms
 Execution Time: 0.091 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.027..0.027 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.025..0.026 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.158 ms
 Execution Time: 0.057 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.35 rows=1 width=45) (actual time=487.564..487.564 rows=0 loops=1)
   ->  Seq Scan on t1  (cost=0.00..170895.98 rows=491791 width=45) (actual time=487.562..487.562 rows=0 loops=1)
         Filter: ((id = 999) AND CASE $1 WHEN 'true'::text THEN (info = $2) ELSE (info = $3) END)
         Rows Removed by Filter: 6000000
 Planning Time: 0.119 ms
 Execution Time: 487.595 ms
(6 rows)

Sure enough, the result we wanted appeared on the 6th time!

You can see that the cost of the first 5 index scans was 3.18, while the cost of the full table scan was 0.35, so the natural optimizer chose the full table scan, but why did the cost become lower and the time last longer? Before answering this question, we need to understand how the cost of the limit clause is calculated.

Limit cost calculation method:

Let's start with the simplest example:

We only took 1 record, the cost is very low and the time is very small.

bill=# explain analyze select * from t1 limit 1;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.02 rows=1 width=45) (actual time=0.105..0.106 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..110921.49 rows=5997449 width=45) (actual time=0.103..0.103 rows=1 loops=1)
 Planning Time: 0.117 ms
 Execution Time: 0.133 ms
(4 rows)

Add where conditions to try?

Cost suddenly became 3703.39, which seems easy to understand, because we have to use where conditions to filter data once before limiting, so cost has become very high.

bill=# explain analyze select * from t1 where id = 1000 limit 1;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..3703.39 rows=1 width=45) (actual time=0.482..0.483 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..125915.11 rows=34 width=45) (actual time=0.480..0.481 rows=1 loops=1)
         Filter: (id = 1000)
         Rows Removed by Filter: 1008
 Planning Time: 0.117 ms
 Execution Time: 0.523 ms
(6 rows)

But when we change the conditions, the result is different:

From where id=1000 to id=999, cost suddenly dropped to 0.13. It seems that the reason why the limit cost of the previous full table scan was lower than the index scan was found.

bill=# explain analyze select * from t1 where id = 999 limit 1;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.13 rows=1 width=45) (actual time=0.041..0.042 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..125915.11 rows=983582 width=45) (actual time=0.040..0.040 rows=1 loops=1)
         Filter: (id = 999)
         Rows Removed by Filter: 107
 Planning Time: 0.114 ms
 Execution Time: 0.079 ms
(6 rows)

So how is the cost of this limit calculated? Why are the cost so different in different conditions?

The following is the limit cost calculation method:

limit_cost = ( N / B ) * A

N: represents the data fetched by limit, if limit 1 then N=1;

B: Indicates the estimated total number of records;

A: Indicates the estimated total cost.

For example, in the above execution plan with cost=0.13, N = 1, B = 983582, A = 125915.11, then limit cost is:

(1/983582)*125915.11 = 0.128, that is, 0.13 displayed in the execution plan.

In short, if the number of rows obtained by filtering through where conditions is, the lower the limit cost will be.

After knowing this, we will look back at the SQL and it will be clear, because there is a lot of data on the condition where id = 999, which leads to the limit cost even for the full table scan, which is very low, even lower than the index scan.

SELECT *
FROM t1
WHERE id = 999
AND (case $1
    WHEN 'true' THEN
    info = $2
    ELSE info = $3 end) limit 1;

However, it should be noted that even the cost in the execution plan we see using explain analyze is an estimated value, not the actual value. Although the difference between this and the actual value will not be very large, if the cost itself is small, it will still bring some misunderstanding.

For example, in the previous SQL, I want to increase the limit cost of the full table scan so that it is larger than the index scan, so the optimizer will always select the index scan, so I changed limit 1 to limit 100 (i.e. increasing the value of N), but it still didn't work:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..5.58 rows=1 width=53) (actual time=0.049..0.051 rows=1 loops=1)
-> Index Scan using idx_scm_bind_scm_customer_id_index on scm_bind t (cost=0.56..5.58 rows=1 width=53) (actual time=0.049..0.050 rows=1 loops=1)
Index Cond: ((scm_customer_id)::text = 'wmGAgeDQAAXcpcw9QWkDOUQsIDI1xOqQ'::text)
Filter: ((bind_status)::text = '2'::text)
Planning Time: 0.160 ms
Execution Time: 0.072 ms
(6 rows)
Time: 0.470 ms
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..8.90 rows=100 width=53) (actual time=1047.859..16654.360 rows=1 loops=1)
-> Seq Scan on scm_bind t (cost=0.00..552392.00 rows=6208050 width=53) (actual time=1047.858..16654.357 rows=1 loops=1)
Filter: (((bind_status)::text = '2'::text) AND CASE $1 WHEN 'client'::text THEN ((scm_customer_id)::text = ($2)::text) ELSE ((scm_customer_id)::text = ($3)::text) END)
Rows Removed by Filter: 12169268
Planning Time: 0.147 ms
Execution Time: 16654.459 ms
(6 rows)
Time: 16654.924 ms (00:16.655)

The following full table scan is obtained by passing the parameters for the 6th time. You can see that the cost of the full table scan is 8.9, and the index scan is 5.58. So you should not choose a higher cost 8.9?

And when we track the actual cost, we can find:

$1 = {magic = 195726186, raw_parse_tree = 0x15df470,
query_string = 0x16d65b8 "PREPARE p1(varchar,varchar,varchar) as\n select\n t.scm_sale_customer_id,\n t.scm_customer_id\n from\n scm_bind t\n where t.bind_status = '2'\n and (case $1 when 'client' then scm_customer_id ="..., commandTag = 0x95b5ba "SELECT", param_types = 0x16d66c8, num_params = 3, parserSetup = 0x0, parserSetupArg = 0x0, cursor_options = 256, fixed_result = true,
resultDesc = 0x16d66e8, context = 0x15df250, query_list = 0x16dbe80, relationOids = 0x16e6138, invalItems = 0x0, search_path = 0x16e6168, query_context = 0x16dbd70, rewriteRoleId = 10,
rewriteRowSecurity = true, dependsOnRLS = false, gplan = 0x16ff668, is_oneshot = false, is_complete = true, is_saved = true, is_valid = true, generation = 6, next_saved = 0x0,
generic_cost = 8.8979953447539888, total_custom_cost = 52.899999999999999, num_custom_plans = 5}

The cost number of actual index scans is approximately 10.58, which is still a certain difference from what is displayed in the execution plan.

Let's get back to the point. Since we know why the limit cost of full table scanning is lower, let's solve the next problem: Why is the cost very low but the actual execution time is so long?

Let's look at the execution plan again:

Limit  (cost=0.00..0.35 rows=1 width=45) (actual time=487.564..487.564 rows=0 loops=1)
   ->  Seq Scan on t1  (cost=0.00..170895.98 rows=491791 width=45) (actual time=487.562..487.562 rows=0 loops=1)
         Filter: ((id = 999) AND CASE $1 WHEN 'true'::text THEN (info = $2) ELSE (info = $3) END)
         Rows Removed by Filter: 6000000
 Planning Time: 0.119 ms
 Execution Time: 487.595 ms
(6 rows)

After careful observation, you can find that the filtering conditions for the info column that should have been used as the index were actually used as the filter condition to filter data.

Then the last problem occurs in the case when expression in this where condition, because before the case when expression is filtered, the binding variable has not been passed in the actual value, and the optimizer naturally cannot choose whether to index the uncertain value. I have to complain about this writing method here. . .

Therefore, when the optimizer calculates limit cost, he only knows that where id = 999 will get a large amount of data, and it is impossible to judge how much data will be obtained in the subsequent case when. Therefore, although the following conditions will only obtain a small part of the data, the total number of records estimated by the optimizer when generating limit cost is only judged based on id = 999, resulting in the estimated cost being very low, but in fact, only a small amount of data is obtained, and a large amount of data needs to be filtered in the table.

I have to sigh that this "simple" SQL actually contains so much knowledge.

This is the end of this article about the magical effect of PostgreSQL limit. For more related PostgreSQL limit content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!