Tips for improving INSERT performance in DB2 (1)

The db2 tutorial I am watching is: Tips for improving INSERT performance in DB2 (1).

INSERT Processing Overview

First, let's take a quick look at the processing steps when inserting a row. Each of these steps has the potential to optimize, which we will discuss later.

Prepare statements on the client. For dynamic SQL, this step must be done before the statement is executed, and the performance here is very important; for static SQL, the performance of this step is actually not very important, because the preparation of the statement is completed in advance.
On the client, the column values of the rows to be inserted are assembled and sent to the DB2 server.
The DB2 server determines which page to insert this line into.
DB2 reserves a location in the buffer pool used for this page. If DB2 selects an existing page, then it needs to read the disk; if a new page is used, the page must be physically allocated in the tablespace (if it is SMS, that is, the tablespace stored in the system management). Each page where a new line is inserted is finally written from the buffer pool to disk.
Format the line in the target page and get an X (exclusive, exclusive) row lock on the line.
Write a record that reflects the insert to the log buffer.
Finally, the transaction containing the insert is submitted. If the records in the log buffer have not been written to the log file at this time, the records are written to the log file.

In addition, many types of additional processing can occur depending on the database configuration, such as the presence of an index or trigger. This additional processing is also of great significance to performance, which we will discuss later.

Alternatives to insert

Before discussing the optimization of insert in detail, let's consider two alternatives to insert: load and import. The import utility is actually a front-end for SQL INSERT, but some of its features are useful for you as well. Load has some useful extra features, but the main reason we use load instead of insert is that it can improve performance.

load formats the data page directly, avoiding most of the overhead of processing each row due to insertion (for example, logging is actually eliminated here). Moreover, load can better take advantage of parallelism on multiprocessor machines. There are two new features in V8 load that have special effects on load becoming an alternative to insert, both from cursor load and from call layer interface (CLI) applications.

Load from cursor

This method can be used for application program code (via the db2Load API), or for DB2 scripts. Here is an example of the latter case:
declare staffcursor cursor forselect * from staff; load from staffcursor of cursor insert into myschema.new_staff;
These two lines can be replaced by the following line:
insert into myschema.new_staff select * from staff
The same equivalent INSERT ... SELECT statement can improve performance by almost 20% compared to the SELECT statement.

Load from CLI

This approach is obviously limited to Call Layer Interface (CLI) applications, but it is very fast. This trick is very similar to array insertion, and DB2 comes with an example like this, using load is twice as fast as using a fully optimized array insertion, almost 10 times faster than unoptimized array insertion.

All the things that insert can improve

Let's look at some of the necessary steps for insertion processing, and the tips we can use to optimize these steps.

1. Statement preparation

As a SQL statement, the INSERT statement must be compiled by DB2 before execution. This step can occur automatically (for example in CLP, or in a CLI SQLExecDirect call) or can be performed explicitly (for example, through a SQL Prepare, CLI SQLPrepare, or JDBC prepareStatement statement). This compilation process involves authorization checking, optimization, and some other activities required when converting statements into executable formats. When compiling a statement, the access plan of the statement is stored in the package cache.

If the same INSERT statement is executed repeatedly, the access plan for that statement is (usually) into the package cache, thus eliminating the overhead of compilation. However, if the insert statement has a different value for each line, each statement will be regarded as unique and must be compiled separately. Therefore, repeat statements like the following:
insert into mytable values (1, 'abc') insert into mytable values (2, 'def')
etc,
It is very desirable to replace it with a statement with a parameter mark, prepare it once and execute it repeatedly:
insert into mytable values (?, ?)

Using parameter markers can increase the speed of a series of inserts several times. (Similar benefits can be obtained by using host variables in static SQL programs.)

2. Send column values to the server

There are several optimization techniques that can be classified into this category. One of the most important tricks is to include multiple lines in each insert statement, which can avoid client-server communication for each line, while also reducing DB2 overhead. Tips that can be used for multi-line insertion are:

[1] [2] Next article