SoFunction
Updated on 2025-03-09

Mysql Detailed Method of Batching Indexing

1. How to create indexes in batches

In production environments, creating indexes in batches is an effective strategy, especially suitable for processing large data tables (such as tens of millions of data), with the goal of reducing the impact on system performance and avoiding long-term locking of tables. Creating indexes in batches can usually be divided into the following methods:

Method 1: Create indexes in batches using temporary tables

This method creates a temporary table, inserts the data of the original table into the temporary table in batches, adds an index on the temporary table, and finally exchanges the original table and the temporary table. This method will not affect the query operation of the production table.

step:

Create a temporary table: Create a table with the same structure as the original table and add an index on the temporary table.

CREATE TABLE temp_table LIKE original_table;
CREATE INDEX idx_column_name ON temp_table (column_name);

Insert data in batches: Insert the original table data into the temporary table in batches. To avoid excessive data insertion at once, it can be performed in batches by batches (such as every 10,000 rows).

INSERT INTO temp_table (columns) SELECT columns FROM original_table
WHERE condition LIMIT 10000 OFFSET 0;
  • Step by step increase offset (OFFSET): To step by step insert data, you can update the offset after each insertion using the following methods.
  • After performing a batch insert, update offset
INSERT INTO temp_table (columns)
SELECT columns FROM original_table WHERE condition LIMIT 10000 OFFSET 10000;

Switch temporary tables and original tables: After all data is inserted into temporary tables and the index has been created, you can replace the temporary tables with the original tables by renaming the tables:

RENAME TABLE original_table TO backup_table, temp_table TO
original_table;

If the operation completes successfully, you can delete the original backup table backup_table, but keep the data and index.

Method 2: Create indexes in batches using partition tables

Process data by partition: You can select a partition for index creation to ensure that the entire table is not locked.

CREATE INDEX idx_column_name ON original_table PARTITION (p1)
(column_name);

Create indexes by partition: If the table is partitioned based on a field, you can create an index for each partition individually and perform the index creation operations on each partition one by one.

CREATE INDEX idx_column_name ON original_table PARTITION (p1)
(column_name); CREATE INDEX idx_column_name ON original_table
PARTITION (p2) (column_name);

Method 3: Create an index in the background (using the ONLINE option)

If the database engine is used to support ONLINE index creation (such as InnoDB), you can avoid locking the entire table during index creation. ONLINE index creation can be done without locking tables, resulting in less impact on the production environment.

Create indexes with ONLINE: Add ALGORITHM=INPLACE and LOCK=NONE when creating indexes to avoid locking tables during index creation.

CREATE INDEX idx_column_name ON original_table (column_name) ALGORITHM=INPLACE, LOCK=NONE;

Progressively creating indexes: If you need to create multiple indexes, you can create them one by one in batches. For example, first create an index for column1, then create an index for column2 after creation, and so on.

CREATE INDEX idx_column1 ON original_table (column1) ALGORITHM=INPLACE, LOCK=NONE;
CREATE INDEX idx_column2 ON original_table (column2) ALGORITHM=INPLACE, LOCK=NONE;

Doing so keeps table availability during index creation and has less impact on performance.

Method 4: Execute ALTER TABLE in batches

If you use ALTER TABLE to create an index, it can also be executed in batches. When creating indexes on large tables, a table lock operation usually occurs (especially without using the ONLINE option), so be cautious in this case.

You can reduce the time to lock tables by splitting large tables into multiple small tables or by updating them in batches.

ALTER TABLE original_table ADD INDEX idx_column_name (column_name);

If the number of rows in the table is very large, the execution time may be long, and it may be necessary to select the appropriate time window based on the actual business load.

Method 5: Use MySQL's pt-online-schema-change tool

pt-online-schema-change is a tool in Percona Toolkit for online modification of MySQL table structures in production environments, including creating indexes without affecting system availability. This tool will perform table modification operations in the background and create a temporary table to ensure that the entire process does not affect the use of the production database.

pt-online-schema-change --alter "ADD INDEX idx_column_name (column_name)" --execute D=your_db,t=original_table

Summarize

Creating indexes in batches is an optimization operation, which is particularly important in production environments, especially when the data volume is large. Common batch strategies include:

  • Use temporary tables: Create temporary tables first and insert data, then create indexes for temporary tables, and finally exchange tables.
  • Partition table processing: If it is a partition table, create an index step by step by partition.
  • Create with ONLINE index: Reduce locking to tables using the ONLINE option supported by MySQL.
  • Batch ALTER TABLE: gradually add indexes at different time points.
  • Use the pt-online-schema-change tool: Safely create indexes in production without affecting availability.
  • With reasonable batching strategies and tools, you can effectively create indexes for large tables in production while maintaining high availability of the system.

2. Add indexes to a data table with tens of millions of dollars

1. Evaluate the fields that need to be indexed

  • First, evaluate and determine which fields need to be indexed. Generally, the following aspects can be considered:
  • Frequently queried fields: If a field often appears in the WHERE clause or is used as a sorting basis, adding an index to the field may significantly improve query performance.
  • JOIN field: Fields used for connections usually require an index because indexing can speed up join operations.
  • Range query fields: Adding indexes to fields used for range queries such as BETWEEN, >, < will improve performance.
  • Fields required by uniqueness: such as primary keys and unique keys, usually indexes should be added.

2. Choose the appropriate index type

  • Single column index: For situations where query conditions only involve a single column.
  • Composite index: When a query involves multiple columns (especially multiple AND conditions), composite indexes can improve performance, but it is important to pay attention to the order of indexed columns.
  • Full-text index: suitable for scenarios where text searches (such as MATCH … AGAINST).
  • Spatial index: If it is geographic data, you can consider using spatial indexes (such as MySQL's SPATIAL index).

3. Assess the impact and prepare for backup

Creating an index can take up system resources, may lock tables or cause performance bottlenecks, especially on large tables. To ensure security, it is very important to back up current data. If index creation fails or problems occur during operation, you can restore the data by restoring the backup.

4. Index creation policy

  • For tens of millions of data tables, creating indexes directly in production environments may lead to performance degradation and even long-term locking tables. To reduce the impact on the production environment, the following strategies can be considered:

4.1 Use the ONLINE option (if supported)

MySQL 5.6 and above support the creation of indexes using the ONLINE option, which allows you to create indexes without locking tables. This helps avoid interference with queries and insertions.

CREATE INDEX idx_column_name ON table_name (column_name) ONLINE;

4.2 Create indexes in batches

  • If the table is very large, consider creating indexes in batches. Assuming that the data distribution of the table has certain rules, the data can be divided into multiple parts and gradually indexed.
  • Indexes can be selected by partitioning tables or based on certain conditions (such as dates, ranges). Using ALGORITHM=INPLACE when creating indexes can reduce the lock time of the table.

4.3 Perform index creation during low traffic periods

In production environments, there are usually low traffic periods, and index creation can be performed at low peaks through timing tasks.

4.4 Gradually add indexes

For large tables that already have indexes, adding indexes step by step can avoid excessive load caused by creating multiple indexes at once. After each index is added, evaluate system performance to ensure there are no major performance bottlenecks before adding the next one.

4.5 Using the pt-online-schema-change tool

The pt-online-schema-change tool provided by Percona Toolkit can safely make changes to table structures (such as creating indexes) without interrupting services. The tool implements changes by creating a new table and synchronizes data through triggers, which has less impact on the production environment.

pt-online-schema-change --alter “ADD INDEX idx_column_name
(column_name)” D=your_db,t=your_table --execute

5. Impact of monitoring performance

It is very important to monitor the performance of the database during the index creation process. Especially in production environments, creating indexes can result in:

  • CPU, memory usage: Large index creation may take up CPU and memory resources.
  • Disk I/O: Index creation requires reading and writing large amounts of data, which can lead to disk I/O peaks.
  • Lock competition: If ONLINE index creation is not supported, the table will be locked, which may affect the execution of other queries.
  • Use monitoring tools such as MySQL Enterprise Monitor, Prometheus + Grafana, Percona Monitoring and Management to monitor database performance metrics to ensure that the load during index creation is not too high.

6. Things to note when creating indexes

  • Select the right field: Make sure that the index you create can meet the actual query needs. If an index is not used by a query, creating it will waste storage space and affect write performance.
  • Avoid too many indexes: While indexes can improve query performance, they can also increase the cost of data insertion, update, and delete. Too many indexes will affect write performance, so try to avoid redundant indexes.
  • Check index usage regularly: Use SHOW INDEX to view the usage of the current index and regularly clean indexes that are no longer in use.

7. Test

Before deploying a production environment, it is recommended to conduct sufficient testing in the development or testing environment to simulate the load and query mode of the production environment to ensure that newly added indexes do not negatively affect system performance.

Test content includes:

- Test the time and resource consumption required to create an index.

- Test the improvements in query performance (or possible performance issues) by new indexes.

- Test system resource consumption and response time during index creation.

8. Optimize indexing strategy

As the data volume continues to grow, the design of the index also needs to be adjusted accordingly. Consider the following aspects to continuously optimize:

  • Regularly delete indexes that are no longer in use.
  • Analyze the query mode, adjust the index design, and use composite indexes to improve query performance.
  • Consider using partitioned tables to better manage large table data.

Summarize

  • When indexing tens of millions of data tables in the production environment, it is necessary to avoid operating directly during peak periods and try to use low traffic periods to operate.
  • Use ONLINE options or tools such as pt-online-schema-change to reduce the impact on the production environment.
  • Regularly monitor the performance of the database to ensure that the index creation process does not cause excessive burden on the production system.
  • Test and optimize index design to avoid creating redundant indexes.

This is the end of this article about Mysql batch indexing. For more related content about Mysql batch indexing, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!