Introduction to the usage of update subquery

Basic knowledge
1. Associated subquery and non-associated subquery

In a non-associative subquery, the internal query is executed only once and returns its value to the external query, which then uses the internal query to return the value to it in its processing. In the association subquery, the internal query must be executed once for each row of data returned by the external query. In addition, the information flow is bidirectional in the association subquery. Each row of data in the external query passes a value to the subquery, and the subquery executes once for each row of data and returns its record. The external query then makes a decision based on the returned records.

like:

SELECT , , 
FROM Orders o1
WHERE  = (SELECT Max(OrderDate)
FROM Orders o2
WHERE  = )

is an associated subquery

SELECT , , 
FROM Orders o1
WHERE  IN
(SELECT TOP 2 
FROM Orders o2
WHERE  = )
ORDER BY CustomerID

is a non-associated subquery

2. Tips (HINT)

Generally, when optimizing, whether using a rule-based or cost-based method, the Oracle system optimizer determines the execution path of the statement. This kind of path to choose should not be the best. Therefore, Oracle provides a method called prompt. It allows programmers to choose the execution path according to their own requirements, that is, prompting the optimizer to follow what execution rules to execute the current statement. This can be better in performance than Oracle optimization's independent decision.

Typically, programmers can use prompts to make optimization decisions. By using prompts, you can specify the following content:

l SQL statement optimization method;

l For a certain SQL statement, the goal of the overhead optimization program is based on;

l The access path to SQL statement access;
l The connection order of the connection statement;
l Connection operation in the connection statement.

If you want the optimizer to execute as required by the programmer, you need to give a prompt in the statement. The valid range of prompts is limited, that is, the statement block with prompts can be executed according to the prompt requirements. The following statement can specify the prompt:

l Simple SELECT ,UPDATE ,DELETE statement;
l Compound subject or subquery statement;
l Make up part of the query (UNION).

The prompt specifies the original comment statement with the addition of "+". The syntax is as follows:

[ SELECT | DELETE|UPDATE ] /*+ [hint | text ] */

[ SELECT | DELETE|UPDATE ] --+ [hint | text ]

Note that after "/*", do not add "+" directly, and "--+" is also written in conjunction.

Warning: If the prompt statement is written incorrectly, Oracle ignores the statement.

Common tips are:

Ordered Force joining in the order of tables specified in the from clause
Use_NL Forces to specify that the joining method between two tables is nested loops (Nested Loops)
Use_Hash Force to specify that the connection method between two tables is a hash join (Hash Join)
Use_Merge Forces to specify that the join between two tables is a merge sorting join (Merge Join)
Push_Subq Let non-associated subqueries be executed in advance
Index Forces the use of an index

3. Implementation plan

Select the SQL statement with the mouse or keyboard in SQL WINDOWS in PL/SQL Developer, and then press F5, and the interface for executing plan resolution will appear:

4. Features of Update

The internal execution of Update can be found in the attached article: Internal analysis of update transactions.doc

The basic point of using Update is

1) Try to use the index on the update table to reduce unnecessary updates
2) The update data source takes as short as possible. If it cannot be done, insert the update content into the intermediate table, then create an index for the intermediate table, and then update it.
3) If the updated primary key is recommended, it is recommended to delete and insert it.
5. Sample table

The following explanation will be based on the following two tables:

Create table tab1 (workdate varchar2(8), cino varchar2(15), val1 number, val2 number);
Create table tab2 (workdate varchar2(8), cino varchar2(15), val1 number, val2 number);
Create table tab3 (workdate varchar2(8), cino varchar2(15), val1 number, val2 number);
Create table tab4 (workdate varchar2(8), cino varchar2(15), val1 number, val2 number);

workdate, cino is the keywords of two tables, and no primary key index is created by default.

2. Update two situations

There are only two situations when updating a table with Update: update the fields according to the associated subquery; and limit the update scope through non-association subquery. If there is a third situation, it is the superposition of the first two situations.

1. Update fields according to the associated subquery

Update tab1 t
Set (val1, val2) = (select val1, val2
from tab2
where workdate = 
and cino = );

Update the corresponding field of tab1 through tab2. When executing SQL statements, the system will read records from tab1 one by one, and then find the corresponding fields to update through the association subquery. Whether the associated subquery can quickly find the corresponding record through tab1's conditions is a necessary condition for whether optimization can be implemented. Therefore, it is generally required to create Unique or Normal indexes with higher weight transfer on tab2. The execution time is approximately (time used to query a record in tab1 + time used to query a record in tab2) * Number of records in tab1.

If the subquery conditions are relatively complicated, such as the following statement:

Update tab1 t
Set (val1, val2) = (select val1, val2
from tab2 tt
where exists (select 1
from tab3
where workdate = 
and cino = )
and workdate = 
and cino = );

At this time, the time spent on subquery for updating each record in tab1 will increase exponentially. If there are many records in tab1, this update statement is almost incomplete.

The solution is to extract the subquery, put it into the intermediate table, then create an index on the intermediate table, and use the intermediate table instead of the subquery, so that the speed can be greatly improved:

Insert into tab4
select workdate, cino, val1, val2
from tab2 tt
where exists (select 1
from tab3
where workdate = 
and cino = );
create index tab4_ind01 on tab4(workdate, cino);

Update tab1 t
Set (val1, val2) = (select val1, val2
from tab4 tt
where workdate = 
and cino = );

2. Through non-associated subqueries, limit the update scope

Update tab1 t
set val1 = 1
where (workdate, cino) in (select workdate, cino from tab2)

According to the data range provided by tab2, the val1 field of the corresponding record in tab1 is updated.

In this case, the system's default execution method is often to first execute the select workdate and cino from tab2 subquery, form the system view in the system, and then select a record in tab1 to query whether there is a corresponding workdate and cino combination in the system view. If it exists, update tab1, and if it does not exist, select the next record. The query time in this way is roughly equal to: subquery query time + (Select a record time in tab1 + Scan the full table in the system view to find a record time) * Number of records in tab1. Among them, "Scan the full table in the system view to find a record time" will vary according to the size of tab2. If the number of tab2 records is small, the system can directly read the table into the system area; if the number of tab2 records is large, the system cannot form a system view, and every update action will be performed, the subquery will be done once, which will be very slow.

There are two optimizations for this situation

1) Add indexes to the workdate and cino fields on tab1, and add prompts.

The modified SQL statements are as follows:

Update /*+ordered use_nl(sys, t)*/ tab1 t
set val1 = 1
where (workdate, cino) in (select workdate, cino from tab2)

where sys represents the system view. If the ordered prompt is not added, the system will use the tab1 table as the driver table by default. At this time, the full table scan of tab1 must be performed. After adding the prompt, use the system view, namely select workdate, cino from tab2, as the driver table, under normal circumstances, the speed can be greatly improved.

2) Add the index to the workdate and cino fields on the tab2 table, and rewrite the SQL statements at the same time:

Update tab1 t
set val1 = 1
where exists (select 1
from tab2
where workdate = 
and cino = )

Three, indexing problem

The use of update index is quite special. Sometimes it seems that full index can be used, but in fact only part of it is used, so it is recommended to write the fields of the composite index together.

For example:

Update /*+ordered use_nl(sys, t)*/ tab1 t
set val1 = 1
where cino in (select cino from tab2)
and workdate = '200506'

This SQL statement cannot be used entirely with the composite index workdate + cino on tab1. All you can use is the constraint of workdate='200506'.

If written like this, there is no problem:

Update /*+ordered use_nl(sys, t)*/ tab1 t
set val1 = 1
where (workdate, cino) in (select workdate, cino from tab2)