For query optimization of billion-level MySQL large tables, the following are 10 proven core optimization strategies and corresponding SQL implementation solutions, combined with the latest technical practice summary:
1. Partition table optimization (horizontal segmentation)
-- Create a time range partition table CREATE TABLE orders ( id INT NOT NULL AUTO_INCREMENT, order_date DATE NOT NULL, amount DECIMAL(10,2), PRIMARY KEY (id, order_date) ) PARTITION BY RANGE (YEAR(order_date)) ( PARTITION p2020 VALUES LESS THAN (2021), PARTITION p2021 VALUES LESS THAN (2022), PARTITION p2022 VALUES LESS THAN (2023), PARTITION p_max VALUES LESS THAN MAXVALUE ); -- Query the specified partition SELECT * FROM orders PARTITION (p2022) WHERE user_id=123;
By partitioning the table by time dimension, the query can only scan related data blocks. It is recommended to automatically create new partitions every month/quarterly
2. Library and table division (horizontal splitting)
-- Create shard table(user_idTake the modulus score64Zhang Biao) CREATE TABLE user_00 LIKE users; CREATE TABLE user_01 LIKE users; ... CREATE TABLE user_63 LIKE users; -- Shard query routing SELECT * FROM user_${user_id % 64} WHERE user_id=123456;
Use ShardingSphere and other middleware to achieve transparent sharding. Pay attention to selecting high-base digital fields for sharding keys to avoid data tilt
3. Index depth optimization
-- Create an overlay index ALTER TABLE orders ADD INDEX idx_cover (user_id, order_date, amount); -- Forced index usage SELECT /*+ INDEX(orders idx_cover) */ order_date, amount FROM orders WHERE user_id=123 AND order_date > '2024-01-01';
Reduce back-table operations by overwriting indexes and use them regularlyANALYZE TABLE
Update statistics. It is recommended that no more than 5 single table indexes be exceeded
4. Separation of hot and cold data
-- Archive historical data CREATE TABLE orders_archive LIKE orders; INSERT INTO orders_archive SELECT * FROM orders WHERE order_date < '2023-01-01'; DELETE FROM orders WHERE order_date < '2023-01-01'; -- Create a joint view CREATE VIEW orders_union AS SELECT * FROM orders UNION ALL SELECT * FROM orders_archive;
It is recommended to migrate data from 3 years ago to the archive table, and use partition table automatic management to replace this solution.
5. Query rewrite optimization
-- Inefficient paging transformation SELECT * FROM orders WHERE id > 1000000 LIMIT 10; -- Avoid full table scanning SELECT user_id FROM orders FORCE INDEX(primary) WHERE create_time BETWEEN '2024-01-01' AND '2024-03-01';
CooperateEXPLAIN
Analyze the execution plan, disableSELECT *
, add to range queryFORCE INDEX
hint
6. Column storage engine
-- Create columnar storage tables CREATE TABLE logs ( id BIGINT, log_time DATETIME, content TEXT ) ENGINE=Columnstore;
Suitable for OLAP scenarios, the query speed of ClickHouse columnar engine can be increased by more than 10 times (need to cooperate with the data synchronization mechanism)
7. Server parameter tuning
# Core parameters[mysqld] innodb_buffer_pool_size=128G innodb_flush_log_at_trx_commit=2 innodb_io_capacity=20000 innodb_read_io_threads=16 query_cache_type=0
The memory configuration is recommended to be 70%-80% of physical memory. The SSD hard disk needs to adjust the IO-related parameters.
8. Asynchronous batch processing mechanism
-- Batch insert optimization INSERT INTO orders (user_id,amount) VALUES (1,100),(2,200),(3,300); -- Delayed update SET GLOBAL innodb_flush_log_at_trx_commit=2; START TRANSACTION; ...Batch operation... COMMIT;
Batch operations reduce the number of transaction commits, cooperateLOAD DATA INFILE
Implement high-speed import
9. Distributed architecture upgrade
-- TiDBDistributed Query SELECT /*+ READ_FROM_STORAGE(tikv[orders]) */ * FROM orders WHERE user_id IN (SELECT user_id FROM vip_users);
When stand-alone performance reaches a bottleneck, migrating to TiDB cluster can achieve automatic sharding and elastic scaling
10. Field type optimization
-- IPAddress storage optimization ALTER TABLE access_log MODIFY ip INT UNSIGNED, ADD INDEX idx_ip (ip); -- Enumeration type transformation ALTER TABLE users MODIFY gender ENUM('M','F') NOT NULL DEFAULT 'M';
useINT
Storage IP (INET_ATON() conversion), useENUM
Alternative string fields to reduce storage space by 50%
Suggestions for optimizing implementation routes:
- Priority for architectural design optimization (distribution and table/partition)
- Implement index refactoring and query rewrite
- Configure reasonable server parameters
- Establish a data archiving mechanism
- Finally consider a distributed solution
Regular useSHOW GLOBAL STATUS
Monitor key indicators such as QPS, TPS, and cache hit rate. When QPS is > 100,000, it is recommended to use NewSQL solutions such as TiDB and other
For more implementation details, please refer to the official MySQL documentation and technical white papers such as ShardingSphere and TiDB. Actual optimization requires a combination of solutions based on business characteristics.
This is the end of this article about 10 methods for query optimization of billion-level MySQL large tables. For more related content on query optimization of MySQL large tables, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!