SoFunction
Updated on 2025-03-09

How to delete duplicate data in mysql data table in MySQL

The general order of deleting duplicate data in the data table is: create backups, determine duplicate records, and write deletion statements

Method 1: With the help of inner join

DELETE t1 FROM table t1
INNER JOIN table t2 
WHERE  <  AND t1.column1 = t2.column1;
  • tableIt's the table name,idis the unique identifier of the table.column1is the column that wants to delete duplicate records based on it. This SQL statement will be deletedcolumn1All records with the same value in the column exceptidThe biggest record.
  • The way it works is that it will be repeated for every duplicate in the tablecolumn1Value foundidRecord with the smallest value and delete it. In this way, onlyidRecords with the largest value will be retained.

Note: Be sure to back up the data before performing an operation, just in case.

Method 2: With the help of group by

DELETE FROM your_table
WHERE id NOT IN (
    SELECT MAX(id)
    FROM your_table
    GROUP BY column1, column2, ... /* List the columns used to judge duplicates */
);
  • This statement retains the largest number of duplicate records in each set ofidand delete other duplicate records.column1, column2To determine duplicate column names.

Note: Be sure to back up the data before performing an operation, just in case.

Method 3: With the help of temporary tables

If the amount of data is large, deleting duplicate data can be created by creating temporary tables, which is more efficient.

step:

1. Create a new temporary table and insert the deduplicated data into the new table.

CREATE TABLE new_table AS
SELECT * FROM old_table
GROUP BY column_to_be_deduplicated;

Here,new_tableis the name of the newly created temporary table.old_tableIt's the original table name.column_to_be_deduplicatedIt is a column name that needs to be deduplicated.

2. Delete the original table/modify the original table name

DROP TABLE old_table;
RENAME TABLE old_table TO old_table_bak;

Rename the new temporary table to the original table name.

RENAME TABLE new_table TO old_table;

This method is to first insert the deduplicated data into a new temporary table, then delete the old table/modify the old table name, and finally rename the temporary table to the old table name. The advantage is that it will be more efficient when processing a large amount of data.

In the second step, it is not recommended to directly delete old tables. Anyone involving data changes must be extremely cautious.

Summarize

The above is personal experience. I hope you can give you a reference and I hope you can support me more.