SoFunction
Updated on 2025-04-14

10 methods of query optimization for billion-level MySQL large tables

For query optimization of billion-level MySQL large tables, the following are 10 proven core optimization strategies and corresponding SQL implementation solutions, combined with the latest technical practice summary:

1. Partition table optimization (horizontal segmentation)

-- Create a time range partition table
CREATE TABLE orders (
    id INT NOT NULL AUTO_INCREMENT,
    order_date DATE NOT NULL,
    amount DECIMAL(10,2),
    PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2020 VALUES LESS THAN (2021),
    PARTITION p2021 VALUES LESS THAN (2022),
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p_max VALUES LESS THAN MAXVALUE
);

-- Query the specified partition
SELECT * FROM orders PARTITION (p2022) WHERE user_id=123;

By partitioning the table by time dimension, the query can only scan related data blocks. It is recommended to automatically create new partitions every month/quarterly

2. Library and table division (horizontal splitting)

-- Create shard table(user_idTake the modulus score64Zhang Biao)
CREATE TABLE user_00 LIKE users;
CREATE TABLE user_01 LIKE users;
...
CREATE TABLE user_63 LIKE users;

-- Shard query routing
SELECT * FROM user_${user_id % 64} WHERE user_id=123456;

Use ShardingSphere and other middleware to achieve transparent sharding. Pay attention to selecting high-base digital fields for sharding keys to avoid data tilt

3. Index depth optimization

-- Create an overlay index
ALTER TABLE orders ADD INDEX idx_cover (user_id, order_date, amount);

-- Forced index usage
SELECT /*+ INDEX(orders idx_cover) */ order_date, amount 
FROM orders 
WHERE user_id=123 AND order_date > '2024-01-01';

Reduce back-table operations by overwriting indexes and use them regularlyANALYZE TABLEUpdate statistics. It is recommended that no more than 5 single table indexes be exceeded

4. Separation of hot and cold data

-- Archive historical data
CREATE TABLE orders_archive LIKE orders;
INSERT INTO orders_archive SELECT * FROM orders 
WHERE order_date < '2023-01-01';
DELETE FROM orders WHERE order_date < '2023-01-01';

-- Create a joint view
CREATE VIEW orders_union AS
SELECT * FROM orders 
UNION ALL
SELECT * FROM orders_archive;

It is recommended to migrate data from 3 years ago to the archive table, and use partition table automatic management to replace this solution.

5. Query rewrite optimization

-- Inefficient paging transformation
SELECT * FROM orders WHERE id > 1000000 LIMIT 10;

-- Avoid full table scanning
SELECT user_id FROM orders FORCE INDEX(primary) 
WHERE create_time BETWEEN '2024-01-01' AND '2024-03-01';

CooperateEXPLAINAnalyze the execution plan, disableSELECT *, add to range queryFORCE INDEXhint

6. Column storage engine

-- Create columnar storage tables
CREATE TABLE logs (
    id BIGINT,
    log_time DATETIME,
    content TEXT
) ENGINE=Columnstore;

Suitable for OLAP scenarios, the query speed of ClickHouse columnar engine can be increased by more than 10 times (need to cooperate with the data synchronization mechanism)

7. Server parameter tuning

# Core parameters[mysqld]
innodb_buffer_pool_size=128G
innodb_flush_log_at_trx_commit=2
innodb_io_capacity=20000
innodb_read_io_threads=16
query_cache_type=0

The memory configuration is recommended to be 70%-80% of physical memory. The SSD hard disk needs to adjust the IO-related parameters.

8. Asynchronous batch processing mechanism

-- Batch insert optimization
INSERT INTO orders (user_id,amount) 
VALUES (1,100),(2,200),(3,300);

-- Delayed update
SET GLOBAL innodb_flush_log_at_trx_commit=2;
START TRANSACTION;
...Batch operation...
COMMIT;

Batch operations reduce the number of transaction commits, cooperateLOAD DATA INFILEImplement high-speed import

9. Distributed architecture upgrade

-- TiDBDistributed Query
SELECT /*+ READ_FROM_STORAGE(tikv[orders]) */ *
FROM orders 
WHERE user_id IN (SELECT user_id FROM vip_users);

When stand-alone performance reaches a bottleneck, migrating to TiDB cluster can achieve automatic sharding and elastic scaling

10. Field type optimization

-- IPAddress storage optimization
ALTER TABLE access_log 
MODIFY ip INT UNSIGNED,
ADD INDEX idx_ip (ip);

-- Enumeration type transformation
ALTER TABLE users 
MODIFY gender ENUM('M','F') NOT NULL DEFAULT 'M';

useINTStorage IP (INET_ATON() conversion), useENUMAlternative string fields to reduce storage space by 50%

Suggestions for optimizing implementation routes:

  • Priority for architectural design optimization (distribution and table/partition)
  • Implement index refactoring and query rewrite
  • Configure reasonable server parameters
  • Establish a data archiving mechanism
  • Finally consider a distributed solution

Regular useSHOW GLOBAL STATUSMonitor key indicators such as QPS, TPS, and cache hit rate. When QPS is > 100,000, it is recommended to use NewSQL solutions such as TiDB and other

For more implementation details, please refer to the official MySQL documentation and technical white papers such as ShardingSphere and TiDB. Actual optimization requires a combination of solutions based on business characteristics.

This is the end of this article about 10 methods for query optimization of billion-level MySQL large tables. For more related content on query optimization of MySQL large tables, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!