PostgreSQL loads data into buffer cache operation method

We all know that data is accessed in the cache much faster than on disk, so how do we load the specified data into the cache in pg? This is a bit similar to Oracle's in-memory.

Of course, it is important to note that it is not necessarily good to load data into memory, because compared to disk, memory is always limited, so we just load the required data into memory in special occasions to speed up access.

We can use the pg_prewarm plugin to load the specified table into the OS Buffer or the pg shared buffer.

Install:

bill=# create extension pg_prewarm ;
CREATE EXTENSION

Performance Test:

Construct the test tables t1 and t2, insert 1000W test data respectively

bill=# create table t1(id int,info text);
CREATE TABLE
bill=# create table t2(id int,info text);
CREATE TABLE
bill=# insert into t1 select generate_series(1,10000000),md5(random()::text);
INSERT 0 10000000
bill=# insert into t2 select generate_series(1,10000000),md5(random()::text);
INSERT 0 10000000

Clear shared_buffer before testing, you can use the following sql to view shared_buffer usage:
Install the pg_buffercache plugin:

bill=# create extension pg_buffercache;
CREATE EXTENSION

Query shared_buffer usage:

SELECT
    ,
    count(*) AS buffers
FROM pg_buffercache b
INNER JOIN pg_class c
   ON  = pg_relation_filenode()
    AND  IN (0, (SELECT oid FROM pg_database
WHERE datname = current_database()))
GROUP BY 
ORDER BY 2 DESC;
                 relname                 | buffers
-----------------------------------------+---------
 pg_attribute                            |      36
 pg_proc                                 |      27
 pg_class                                |      15
 pg_operator                             |      14
 pg_depend_reference_index               |      13
 pg_depend                               |      11
 pg_attribute_relid_attnum_index         |      10
 pg_proc_proname_args_nsp_index          |       9
......

You can see that both t1 and t2 tables are not in the shared_buffer, so let’s manually load the t2 table into the shared_buffer.

bill=# SELECT pg_prewarm('t2');
 pg_prewarm
------------
      83334
(1 row)

Performance Test:

It can be seen that the performance of scanning the t2 table in the full table has been greatly improved.

bill=# explain analyze select * from t1;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.060..772.902 rows=10000000 loops=1)
 Planning Time: 0.294 ms
 Execution Time: 1044.922 ms
(3 rows)

Time: 1045.722 ms (00:01.046)

bill=# explain analyze select * from t2;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Seq Scan on t2  (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.012..519.691 rows=10000000 loops=1)
 Planning Time: 0.280 ms
 Execution Time: 790.607 ms
(3 rows)

Time: 791.314 ms

pg_prewarmOther introduction:

The following mainly introduces the pg_prewarm function:
The creation statement of this function is as follows:

CREATE FUNCTION pg_prewarm(regclass,
mode text default buffer,
fork text default main,
first_block int8 default null,
last_block int8 default null)
RETURNS int8
AS MODULE_PATHNAME, pg_prewarm
LANGUAGE C

The parameters are as follows:

regclass: To make prewarm table name
mode:Prewarm mode. prefetch means asynchronous prefetch to os cache; read means synchronous prefetch; buffer means synchronous shared buffer read into PG synchronously
fork: The type of relation fork. Main is generally used, and other types include visibilitymap and fsm
first_block & last_block: Start and end block numbers. The first_block of the table=0, last_block can be obtained through the relpages field of pg_class
RETURNS int8: The function returns the number of blocks processed by pg_prewarm (integer)

Some people may think: I can just query the table select * the entire table once and load the data into the cache, why do I still need to use pg_prewarm? Because when performing full table scanning for tables with sizes exceeding shared_buffer/4, pg generally does not use all shared_buffer, but only uses a small number of shared_buffers. Therefore, loading large tables into the cache cannot be directly implemented with a query, and pg_prewarm can just meet this requirement.

Reference link:

/docs/13/
/docs/13/

This is all about this article about PostgreSQL loading data into buffer cache. For more related PostgreSQL data loading buffer cache, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!