Detailed introduction to PostgreSQL cursor and index selection instance

I wrote a case before that order by limit selected the wrong index because of uneven data distribution. This is because the optimizer cannot judge the distribution relationship of the data, and by default it is believed that the data distribution is uniform.

In addition to limit, when we use cursors, we should also pay attention to similar situations. Often, it is more difficult for us to find that this type of SQL in stored procedures has chosen the wrong execution plan, so we need to pay attention.

1. Create a test table

bill=# create table tbl (id int, c1 int, c2 int, c3 int, c4 int);
CREATE TABLE

2. Write a batch of random data, with IDs ranging from 1 to 10 million.

bill=# insert into tbl select generate_series(1,10000000), random()*100, random()*100, random()*100, random()*100;  
INSERT 0 10000000

3. Write another batch of 1 million pieces of data, c1 and c2 are different from the previous 10 million values.

bill=# insert into tbl select generate_series(10000001,11000000), 200,200,200,200;  
INSERT 0 1000000

4. Create two indexes, which is what this article needs to focus on. Which index is more cost-effective to go to?

bill=# create index idx_tbl_1 on tbl(id);  
CREATE INDEX  
bill=# create index idx_tbl_2 on tbl(c1,c2,c3,c4);  
CREATE INDEX

5. Collect statistical information

bill=# vacuum analyze tbl;  
VACUUM

6. Check the SQL execution plan below and go through the correct index

bill=# explain select * from tbl where c1=200 and c2=200 order by id;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Sort  (cost=72109.20..72344.16 rows=93984 width=20)
   Sort Key: id
   ->  Bitmap Heap Scan on tbl  (cost=1392.77..60811.81 rows=93984 width=20)
         Recheck Cond: ((c1 = 200) AND (c2 = 200))
         ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1369.28 rows=93984 width=0)
               Index Cond: ((c1 = 200) AND (c2 = 200))
(6 rows)

7. When we use this SQL in the cursor, we will find that the execution plan has deviations.

bill=# begin;
BEGIN
bill=*# explain declare tt cursor for select * from tbl where c1=200 and c2=200 order by id;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using idx_tbl_1 on tbl  (cost=0.43..329277.60 rows=93984 width=20)
   Filter: ((c1 = 200) AND (c2 = 200))
(2 rows)

Why does this happen? This is actually because SQL using cursor will automatically optimize based on the cursor_tuple_fraction parameter, and the default parameter is 0.1, which means that only the first 10% of the rows are retrieved for estimates, which is a bit similar to limit.

Because for this table, the optimizer believes that the data is uniformly distributed, but in fact, the data distribution is uneven, and the records of c1=200 and c2=200 are at the end of the table. When we retrieve only the top 10% of the rows in the cursor, we get an incorrect execution plan.

We can see the specific details in the &:

When using cursor or SPI_PREPARE_CURSOR function, the CURSOR_OPT_FAST_PLAN flag bit will be set, and then the SQL will be automatically optimized according to the cursor_tuple_fraction parameter. Therefore, for some uneven data distribution, it may be

The wrong execution plan was selected。
	/* Determine what fraction of the plan is likely to be scanned */
	if (cursorOptions &amp; CURSOR_OPT_FAST_PLAN)
	{
		/*
		 * We have no real idea how many tuples the user will ultimately FETCH
		 * from a cursor, but it is often the case that he doesn't want 'em
		 * all, or would prefer a fast-start plan anyway so that he can
		 * process some of the tuples sooner.  Use a GUC parameter to decide
		 * what fraction to optimize for.
		 */
		tuple_fraction = cursor_tuple_fraction;
		/*
		 * We document cursor_tuple_fraction as simply being a fraction, which
		 * means the edge cases 0 and 1 have to be treated specially here.  We
		 * convert 1 to 0 ("all the tuples") and 0 to a very small fraction.
		 */
		if (tuple_fraction &gt;= 1.0)
			tuple_fraction = 0.0;
		else if (tuple_fraction &lt;= 0.0)
			tuple_fraction = 1e-10;
	}
	else
	{
		/* Default assumption is we need all the tuples */
		tuple_fraction = 0.0;
	}

This is the end of this article about the detailed introduction of PostgreSQL cursor and index selection example. For more related PostgreSQL cursor and index selection content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!