SoFunction
Updated on 2025-03-04

Three ways to delete duplicate data in MySQL

To delete duplicate data in MySQL and keep only one, you can use the following method (just copy and modify the conditions and table names when you want to use it)

Method 1: Use left join + subquery to delete duplicate data (recommended)

Warm reminder: It takes about 15s-30s to execute this SQL under 500w data

Use left join (the recommended method to delete duplicate data and add a unique combination index. It can be used, and it can also be used with large data volumes)

// Delete the historical data first before adding a unique combination indexDELETE u1
FROM uf_cs_record_batch_detail u1
LEFT JOIN (
    SELECT MIN(id) AS min_id
    FROM uf_cs_record_batch_detail
    GROUP BY cs_contact_name, cs_safe_remark
) u2
ON  = u2.min_id
WHERE u2.min_id IS NULL;

Also add a unique combination index to SQL writing method (very practical):

// Add a unique index of the combinationALTER TABLE uf_cs_record_batch_detail ADD UNIQUE KEY idx_uni_contact_safe_stat (cs_contact_name, cs_safe_remark);

Method 2: Create a temporary table (it needs to be executed in multiple steps, the logic is clear, but the ID value will be changed)

This method assumes that you have a tableyour_table, and you want to determine which data is duplicate based on certain columns.
For example, if you want to delete thecolumn1andcolumn2Repeated records, only one record is retained, you can follow the following steps:

  • useCREATE TABLEThe statement creates a temporary table that is used to store unique records.
  • useINSERT INTO ... SELECTThe statement inserts a unique record into a temporary table.
  • Delete all records in the original table.
  • useINSERT INTO ... SELECTThe statement inserts the records in the temporary table back to the original table.
  • Delete temporary tables.

Here is a complete SQL example:

-- Create a temporary table SQL refer to
CREATETABLE temp_table AS
SELECT*FROM your_table

-- This will be temporarily present without duplicate data temp_table In the temporary table
INSERTINTO temp_table
SELECT*FROM your_table t1
WHERE  = (
    SELECTMIN()
    FROM your_table t2
    WHERE t1.column1 = t2.column1
    AND t1.column2 = t2.column2
);

-- Then delete the data in the source table
DELETEFROM your_table whereWHERE Fields1=value;

-- 再将In the temporary table不重复数据重新写回到源表中
INSERTINTO your_table
SELECT*FROM temp_table;

-- Finally delete the temporary table
DROPTABLE temp_table;

In this way, you successfully deleted the duplicate records in the original table, leaving only one unique record.

Note: However, this method will change the original data ID, so this method is used according to the occasion.

Method 3: Use JOIN self-concatenated query (you need to pay attention to performance issues)

To avoid changing the original data ID, we can use a different approach by using a self-connection to mark duplicate data and delete unnecessary records. This method deletes duplicate records while retaining the original ID.

Suppose your table structure is as follows:

  • Table name:your_table
  • List:id(primary key),column1column2, and other columns.

You can use the following SQL to delete duplicate records, keeping only one (usually the one with the smallest ID):

-- Step 1: Mark duplicate records to be deleted
DELETE t1
FROM your_table t1
INNER JOIN your_table t2 
WHERE 
     > 
    AND t1.column1 = t2.column1
    AND t1.column2 = t2.column2;

-- Step 2: Confirm that the deletion is successful,View remaining data
SELECT * FROM your_table;

explain:

- Mark duplicate records to be deleted: We use the self-connection INNER JOIN to find duplicate records, and use WHERE > to ensure that only records with larger ids are deleted, thus keeping records with smallest ids.

  • Confirm that the deletion is successful: View the remaining data through the SELECT statement to ensure that the deletion operation is correct. The advantage of this method is that the ID of the original data will not be changed. Keep the record with the smallest ID in each set of duplicate records. Simple and efficient operation.

A brief summary

  • useleft joinDelete duplicate data (recommended), suitable for large data volume, performance OK
  • Create temporary tables Suitable for scenarios where data tables need to be rebuilt, suitable for situations with medium data volume, but it is quite cumbersome
  • Self-concatenated query can retain the minimum ID, suitable for deleting duplicate data without changing the ID.

This is the end of this article about the three ways to write SQL for deduplicate data in MySQL. For more related content related to deduplicate data in MySQL, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!