Detailed introduction to MySQL partition performance

one,Partition concept

PartitionAllows multiple parts of a single table to be allocated across the file system according to specified rules. Different parts of the table are stored as separate tables at different locations. MySQL supports Partition since 5.1.3.

Comparison of partition and manual table

Manual table classification	Partition
Multiple data tables	A data sheet
Risk of duplicate data	There is no risk of data duplication
Write multiple tables	Write a table
No unified constraints	Forced constraints

MySQL supports RANGE, LIST, HASH, and KEY partition types, among which RANGE is the most commonly used:

Range – This pattern allows data to be divided into different ranges. For example, a table can be divided into several partitions by year.
Hash – This pattern allows calculation of the Hash Key of one or more columns of the table, and finally partitioning the data area corresponding to different values of this Hash code. For example, you can create a table that partitions the table primary key.
Key (key value) - an extension of the Hash pattern above, here the Hash Key is generated by the MySQL system.
List – This pattern allows the system to segment data by the value of the predefined list.
Composite (Composite Mode) – The combination of the above modes

2. What can partitions do

Logical data segmentation
Improve single writing and reading application speed
Improve the speed of partition-wide read query
Splitting data can have multiple different physical file paths
Efficiently save historical data
Constraint checking on a table
Different master-slave server partitioning policies, such as master partition by Hash partition, slave partition by range partition

Three, partition restrictions(Deadline5.1.44version)

• You can only partition the integer columns of the data table, or the data columns can be converted into integer columns through partitioning functions.

• The maximum number of partitions cannot exceed 1024

• If there is a unique index or primary key, the partition column must be included in all unique index or primary keys.

• Foreign keys are not supported

• Full text index is not supported

Partitioning by date is very suitable because many date functions can be used. But there are not many suitable partition functions for strings

4. When to use partitions

• Massive data tables

• Quick query of historical tables can be done by ARCHIVE+PARTITION.

• The data table index is greater than the server's valid memory

• For large tables, especially when the index is much larger than the server's effective memory, you can not use the index, and the partition efficiency will be more effective at this time.

Five, partition experiment

Experiment 1:

Data published using US Bureau of Transportation Statistics (CSV format). Currently, it includes 113 million records, 7.5 GB data with 5.2 GB index. Time from 1987 to 2007.

The server uses 4GB of memory, so that the size of the data and indexes exceeds the memory size. The reason for setting to 4GB is that the data warehouse size is much larger than the possible memory size, which may reach several TB. For ordinary OLTP databases, index caches are in memory and can be retrieved quickly. If the data exceeds the memory size, you need to use a different approach.

Create a table with a primary key, because usually the table will have a primary key. The primary key of the table is too large to make the index unable to be read into memory, which is generally not efficient, meaning that you need to access the disk frequently, and the access speed depends entirely on your disk and processor. Currently, in large data warehouses designed, there is a common practice to not use indexes. Therefore, it will also be more capable of having and without primary keys.

Test method:

Use three types of data to induce MyISAM, InnoDB, Archive.
For each type of introductory, create an unpartitioned table with primary key (except archive) and two partitioned tables, one by month and one by year. The partition table partitioning method is as follows:

CREATE TABLE by_year (

d DATE

)

PARTITION BY RANGE (YEAR(d))

(

PARTITION P1 VALUES LESS THAN (2001),

PARTITION P2 VALUES LESS THAN (2002),

PARTITION P3 VALUES LESS THAN (2003),

PARTITION P4 VALUES LESS THAN (MAXVALUE)

)

CREATE TABLE by_month (

d DATE

)

PARTITION BY RANGE (TO_DAYS(d))

(

PARTITION P1 VALUES LESS THAN (to_days(‘2001-02-01′)), — January

PARTITION P2 VALUES LESS THAN (to_days(‘2001-03-01′)), — February

PARTITION P3 VALUES LESS THAN (to_days(‘2001-04-01′)), — March

PARTITION P4 VALUES LESS THAN (MAXVALUE)

)

Each is tested on a separate instance on mysql server, with only one library and one table per instance. Each type of introductory service will be started, the query will be run and the results will be recorded, and the service will be closed. Service instances are created with MySQL Sandbox.

The data loading is as follows:

ID	Introduce	Whether to partition	data	size	Remark	*Loading time ()**
1	MyISAM	none	113 million	13 GB	with PK	37 min
2	MyISAM	by month	113 million	8 GB	without PK	19 min
3	MyISAM	by year	113 million	8 GB	without PK	18 min
4	InnoDB	none	113 million	16 GB	with PK	63 min
5	InnoDB	by month	113 million	10 GB	without PK	59 min
6	InnoDB	by year	113 million	10 GB	without PK	57 min
7	Archive	none	113 million	1.8 GB	no keys	20 min
8	Archive	by month	113 million	1.8 GB	no keys	21 min
9	Archive	by year	113 million	1.8 GB	no keys	20 min

*On dual-Xeon server

To compare the effect of partitions on large and small data sets, another 9 instances were created, each containing slightly less than 2GB of data.

There are two types of query statements

Gathering query

SELECT COUNT(*)

FROM table_name

WHERE date_column BETWEEN start_date and end_date

Specify record query

SELECT column_list

FROM table_name

WHERE column1 = x and column2 = y and column3 = z

For the first query, create statements with different date ranges. For each range, create an additional set of queries with the same range date. The first query in each date range is a cold query, which means it is the first hit, and the subsequent query in the same range is a warm query, which means it is at least partially cached. The query statement is on the Forge.

result:

1Partition table with primary key

The first test uses a composite primary key, just like the original data table uses. The primary key index file reaches 5.5 GB. It can be seen that the partition not only does not improve performance, but the primary key also slows down operations. Because if you use primary key index query, and the index cannot be read into memory, the performance will be very poor. It is prompted that partitioning is useful, but it must be used properly.

+——–+—————–+—————–+—————–+

+——–+—————–+—————–+—————–+

| cold | 2.6574570285714 | 2.9169642 | 3.0373419714286 |

| warm | 2.5720722571429 | 3.1249698285714 | 3.1294000571429 |

+——–+—————–+—————–+—————–+