The general order of deleting duplicate data in the data table is: create backups, determine duplicate records, and write deletion statements
Method 1: With the help of inner join
DELETE t1 FROM table t1 INNER JOIN table t2 WHERE < AND t1.column1 = t2.column1;
-
table
It's the table name,id
is the unique identifier of the table.column1
is the column that wants to delete duplicate records based on it. This SQL statement will be deletedcolumn1
All records with the same value in the column exceptid
The biggest record. - The way it works is that it will be repeated for every duplicate in the table
column1
Value foundid
Record with the smallest value and delete it. In this way, onlyid
Records with the largest value will be retained.
Note: Be sure to back up the data before performing an operation, just in case.
Method 2: With the help of group by
DELETE FROM your_table WHERE id NOT IN ( SELECT MAX(id) FROM your_table GROUP BY column1, column2, ... /* List the columns used to judge duplicates */ );
- This statement retains the largest number of duplicate records in each set of
id
and delete other duplicate records.column1
,column2
To determine duplicate column names.
Note: Be sure to back up the data before performing an operation, just in case.
Method 3: With the help of temporary tables
If the amount of data is large, deleting duplicate data can be created by creating temporary tables, which is more efficient.
step:
1. Create a new temporary table and insert the deduplicated data into the new table.
CREATE TABLE new_table AS SELECT * FROM old_table GROUP BY column_to_be_deduplicated;
Here,new_table
is the name of the newly created temporary table.old_table
It's the original table name.column_to_be_deduplicated
It is a column name that needs to be deduplicated.
2. Delete the original table/modify the original table name
DROP TABLE old_table;
RENAME TABLE old_table TO old_table_bak;
Rename the new temporary table to the original table name.
RENAME TABLE new_table TO old_table;
This method is to first insert the deduplicated data into a new temporary table, then delete the old table/modify the old table name, and finally rename the temporary table to the old table name. The advantage is that it will be more efficient when processing a large amount of data.
In the second step, it is not recommended to directly delete old tables. Anyone involving data changes must be extremely cautious.
Summarize
The above is personal experience. I hope you can give you a reference and I hope you can support me more.