SoFunction
Updated on 2024-10-30

How to do the de-duplication problem in python

How python does de-duplication

Custom function de-duplication

Parsing Thoughts:

  • 1. Determine the goal of de-weighting
  • 2, to an empty list to receive the elements of the de-emphasized
  • 3. Iterate through the sequence that needs to be de-duplicated, and filter the duplicate data
  • 4、Print the data after de-weighting
l = [1,1,3,2,2,3,4,2,5]
new = []
for i in l:
    if i not in new:
        (i)
print(new)

Output results:

[1, 3, 2, 4, 5]

Built-in function de-duplication

l = [1,1,3,2,2,3,4,2,5]
b = list(set(l))
print(b)

Output results:

[1, 2, 3, 4, 5]

It can be seen that de-weighting changes the order of the sequence, so after de-weighting you need to sort by element index to keep the original order of the sequence

The code is as follows:

l = [1,1,3,2,2,3,4,2,5]
a = list(set(l))
(key=)
print(a)

Output results:

[1, 3, 2, 4, 5]

5 common ways to de-duplicate python lists

List de-duplication is very common in the practical use of python and is the most basic key knowledge.

The following summarizes 5 common list de-duplication methods

First, use for loop to realize the list of de-emphasis

  • The original order remains unchanged after this method of de-weighting.
# for loop for list de-duplication
list1 = ['a', 'b', 1, 3, 9, 9, 'a']
list2 = []
for l1 in list1:
    if l1 not in list2:
        (l1)
print(list2)

Result: ['a', 'b', 1, 3, 9]

II. Using list-deductive de-duplication

  • The original order remains unchanged after this method of de-weighting.
# Use list-deductive de-duplication
list1 = ['a', 'b', 1, 3, 9, 9, 'a']
res = []
[(i) for i in list1 if i not in res]
print(res)

Result: ['a', 'b', 1, 3, 9]

Third, the use of set conversion function set () to achieve the list of de-emphasis

  • Principle: Repetition between elements of the same set is not allowed
# set() list de-duplication
list1 = ['a', 'b', 1, 3, 9, 9, 'a']
list2 = list(set(list1))
print(list2)

Result: [1, 3, 9, 'b', 'a']

Problem: After using the set() function to remove weight, it will automatically sort, then the order of the original list will be changed

There are 2 solutions:

  • The first method, using the sort() method
# # The first method, sort() #
list1 = ['a', 'b', 1, 3, 9, 9, 'a']
list2 = list(set(list1))
(key=)
print(list2)

Result: ['a', 'b', 1, 3, 9]

Note: The sort() method has no return value and sorts the list elements in place

  • The second method, using the sorted() function
# The second method, sored()
list1 = ['a', 'b', 1, 3, 9, 9, 'a']
list2 = sorted(list(set(list1)), key=)
print(list2)

Result: ['a', 'b', 1, 3, 9]

Note: The python built-in function sorted() function returns a new list and does not modify the original list in any way.

Fourth, the use of new dictionaries to achieve list de-emphasis

  • Principle: Dictionary "keys" are not allowed to be duplicated.
  • After this method of de-duplication, the original order remains unchanged.
# Implement list de-duplication using a new dictionary
list1 = ['a', 'b', 1, 3, 9, 9, 'a']
dic = {}
dic = (list1).keys()
print(list(dic))

Result: ['a', 'b', 1, 3, 9]

V. Delete the existence of duplicate data in the list

  • The four de-duplication methods above all keep one and remove the others
  • The following method, on the other hand, retains none of the duplicates as long as they exist
# Delete values where duplicates exist, do not retain
list1 = ['a', 'b', 1, 3, 9, 9, 'a']
list2 = [i for i in list1 if (i) == 1]
print(list2)

Result: ['b', 1, 3]

Well, this is about the list of 5 methods of de-emphasis, you can choose the corresponding method according to the needs.

summarize

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.