introduction
This article aims to explore how to efficiently insert and delete big data in Oracle database. With specific code examples and detailed explanations, we will show the following:
- How to use parallel query for efficient data insertion operations.
- How to use cursor and batch processing technology to delete big data.
- Performance comparison and optimization suggestions for insertion and deletion operations.
- Common problems and solutions that need to be paid attention to in actual operation.
Oracle Big Data Insertion Operation
Scenarios and requirements of insertion operations
In big data environments, insert operations are usually used in the following scenarios:
- Data migration: Migrate data from one table to another, possibly for data archiving or structural optimization.
- Data synchronization: Load data from external data sources into Oracle database to keep the data up to date.
- Data backup: Create a backup copy of the data to prevent data loss or corruption.
In these scenarios, the amount of data is often very large, so efficient insertion methods are required to ensure quick completion of operations.
Data insertion using parallel query
To improve the efficiency of insertion operations, Oracle database supports the use of parallel queries to accelerate data processing. Parallel queries can use multiple CPU cores to process data simultaneously, thereby significantly improving performance.
Sample code: Create a new table and insert data
Here is a sample code that uses parallel queries to create a new table and insert data:
CREATE TABLE BIG_TABLE_DATA20221228 AS SELECT /*+ parallel(t,8) */ * FROM BIG_TABLE_DATA WHERE delete_flag=0;
Explain the key points in the code
-
CREATE TABLE … AS SELECT: This is a common SQL statement used to create a new table by selecting data from an existing table. In this example, the new table
BIG_TABLE_DATA20221228
It is through selectionBIG_TABLE_DATA
Data created in the table. -
Parallel query prompt (parallel):
/*+ parallel(t,8) */
Is an Oracle prompt that tells the database to use parallel processing when executing queries.t
is an alias for the table.8
It means that 8 parallelism degrees (i.e. 8 CPU cores) are used to process queries. Parallel queries can significantly improve the processing speed of large data volumes. -
WHERE clause:
WHERE delete_flag=0
Used to filter data that meets specific conditions. In this example, only selectdelete_flag
equal'0'
record.
Performance optimization suggestions
- Set the parallelism appropriately: The parallelism setting should be determined based on the number of CPU cores of the system and the current system load. Excessive parallelism may lead to system resource competition, which will reduce performance.
- Index optimization: Ensure that the columns used in the query criteria have appropriate indexes to speed up data retrieval.
-
Avoid unnecessary columns:exist
SELECT
Only select the required columns in the statement to avoid selecting all columns (i.e.SELECT *
) to reduce data transfer and memory usage. - Regularly maintain statistical information: Ensure that the database statistics are up to date, which helps the optimizer generate efficient execution plans.
Oracle Big Data Deletion Operation
Delete operation scenarios and requirements
In big data environments, deletion operations are usually used in the following scenarios:
- Data cleaning: Regularly clean out expired or no longer needed data to free up storage space and maintain database performance.
- Data Archive: After migrating historical data to archive tables or external storage, delete the data from the main table.
- Data Repair: Delete wrong or duplicate data to ensure data quality and consistency.
Since deletion operations can involve a large amount of data, efficient methods are needed to complete these operations to avoid negative impacts on system performance.
Data deletion using cursors and batch processing
When handling large-scale data deletion, performing large-scale deletion operations directly can cause performance issues and lock contention. The use of cursors and batch processing can effectively control the number of records deleted each time, reducing the impact on system resources.
Sample code: Batch deletion of data
Here is a sample code for data deletion using cursors and batch processing:
DECLARE CURSOR c IS SELECT rowid FROM BIG_TABLE_DATA WHERE delete_flag= 0; TYPE rowid_table_type IS TABLE OF ROWID INDEX BY PLS_INTEGER; rowid_table rowid_table_type; l_limit PLS_INTEGER := 1000; -- Number of records deleted in batches per batch BEGIN OPEN c; LOOP FETCH c BULK COLLECT INTO rowid_table LIMIT l_limit; EXIT WHEN rowid_table.COUNT = 0; FORALL i IN 1 .. rowid_table.COUNT DELETE FROM BIG_TABLE_DATA WHERE rowid = rowid_table(i); COMMIT; -- Submit transactions after each batch deletion END LOOP; CLOSE c; END;
Explain the key points in the code
-
Cursor definition and opening:
CURSOR c IS ...
Defines a cursor to select the record to be deletedrowid
。OPEN c;
Open the cursor and prepare to start data retrieval. -
Collect data in batches:
FETCH c BULK COLLECT INTO rowid_table LIMIT l_limit;
Use BULK COLLECT to collect data in cursors in batchesrowid_table
In the number of records collected each time isl_limit
Control (set to 1000 items here). -
Batch deletion of data:
FORALL i IN 1 .. rowid_table.COUNT DELETE FROM ...
Use the FORALL statement to delete collected records in batches. FORALL statements can significantly improve the performance of batch operations. -
Transaction control: Used after each batch deletion
COMMIT;
Submit transactions to ensure the atomicity and consistency of the delete operation, while freeing lock resources. -
Cycle control:
EXIT WHEN rowid_table.COUNT = 0;
Controls the end of the loop condition and exits the loop when there are no more records.
Performance optimization suggestions
- Batch processing: Control the number of records deleted each time through batch processing to avoid long-term lock holding and resource competition.
- Index maintenance: After deleting a large amount of data, rebuild the relevant index to ensure that query performance is not affected.
- Table partition: Partitioning large tables can significantly improve the performance of data deletion. The deletion operation can be performed on a specific partition without affecting the data of other partitions.
- Asynchronous deletion: For data deletion tasks that are not required in real time, you can consider performing them during non-peak hours to reduce the impact on other system operations.
- Statistical information update: After deleting a large amount of data, update the statistical information of the table and index in a timely manner to help the optimizer generate a more efficient execution plan.
Comparison and precautions for insertion and deletion operations
Common pitfalls and solutions
Locking and performance issues caused by large transactions:
- trap: Deleting a large amount of data at one time may cause long-term table locking, affecting other concurrent operations.
- Solution: Use batch deletion method to split large transactions into multiple small transactions to reduce lock time. PL/SQL blocks and cursors can be used to process deletion operations in batches.
Index and trigger impact:
- trap: When inserting or deleting large amounts of data, the maintenance of related indexes and triggers will add additional overhead, affecting performance.
- Solution: Before batch insertion or deletion, unnecessary indexes and triggers can be temporarily disabled, and then re-enabled after the operation is completed. It should be noted that this operation requires caution to ensure data consistency.
Tablespace and storage management:
- trap: Large-scale insertion or deletion operations may cause insufficient table space or fragmentation, affecting database performance.
-
Solution: Regularly monitor and manage table space to ensure sufficient storage space. For deletion operations, table reorganization can be performed regularly (for example, using
ALTER TABLE ... SHRINK SPACE
) to reduce fragmentation.
Log and Archive Impact:
- trap: Large-scale insertion or deletion operations will generate a large amount of logs and archived data, which may lead to insufficient log space or overload of the archived process.
- Solution: Before performing large-scale data operations, ensure sufficient logs and archive space and adjust the archive strategy appropriately. If possible, choose to operate during a period of low system load.
Points to note in practice
- Use batch processing: Whether it is insertion or deletion, batch processing and batch submission should be used to control the amount of data in each operation to avoid negative impacts on system performance.
- Parallel processing: In large data operations, parallel query and parallel processing are used reasonably to improve operational efficiency.
- Index and constraint management: Before large-scale data operations, consider temporarily disabling relevant indexes and constraints, and rebuild them after the operation is completed to improve operation performance.
- Monitoring and tuning: Real-time monitoring of system performance, adjust operating strategies and parameters in a timely manner according to load conditions and operation needs, and ensure system stability and efficiency.
Summarize
The above is personal experience. I hope you can give you a reference and I hope you can support me more.