SoFunction
Updated on 2025-04-16

Three ways to count the occurrence of different integers in Python

1. Question definition: What is "counting of different integers"?

Suppose we have a list of repeated integers: [1, 2, 3, 4, 2, 3, 4, 5], we need to count how many non-repetitive integers there are. The answer is obviously 5 (1, 2, 3, 4, 5).

This problem seems simple, but it often comes with complex scenarios in practical applications:

  • Huge amount of data (millions or even billions)
  • Real-time statistics are required
  • Limited memory resources
  • Need to get the number of occurrences at the same time

2. Solution 1: Collection deduplication method (suitable for basic scenarios)

Core idea: use the uniqueness of set elements to automatically deduplicate.

my_list = [1, 2, 3, 4, 2, 3, 4, 5]
unique_values = set(my_list)  # Convert to collectioncount = len(unique_values)    # Get the collection lengthprint(count)  # Output:5

Principle description:

  • Set() function converts a list into a collection, automatically filters duplicate elements
  • The search time complexity of the set is O(1), which is suitable for quickly judging the existence of elements.
  • Finally, the collection size is obtained through the len() function, that is, the number of different integers

Performance Features:

  • Time complexity: O(n) (conversion set)
  • Space complexity: O(n) (storage unique value)
  • Advantages: Concise code and fast execution
  • Disadvantages: Cannot obtain the specific occurrence number

3. Solution 2: Dictionary Counting Method (suitable for scenarios where frequency is required)

Core idea: Use dictionary to store the number of occurrences of each integer, and finally count the number of dictionary keys.

my_list = [1, 2, 3, 4, 2, 3, 4, 5]
count_dict = {}
 
for num in my_list:
    count_dict[num] = count_dict.get(num, 0) + 1  # if it exists, +1, otherwise it will be initialized to 1 
count = len(count_dict)
print(count)  # Output:5

Principle description:

  • When traversing the list, use the get() method to safely obtain the current count value.
  • count_dict.get(num, 0) means: if num exists, return the count value, otherwise return 0
  • Finally, the number of different integers is obtained through the number of keys in the dictionary

Extended application:

  • Get the specific occurrence number: print(count_dict) Output {1:1, 2:2, 3:2, 4:2, 5:1}
  • Find the most frequent integer: max(count_dict, key=count_dict.get)

Performance Features:

  • Time complexity: O(n) (single traversal)
  • Space complexity: O(n) (storages all key-value pairs)
  • Advantages: You can obtain detailed frequency information
  • Disadvantages: More storage space is required compared to the collection method

4. Solution 3: (Professional statistical tools)

Core idea: Use the Counter class in the Python standard library, designed specifically for counting.

from collections import Counter
 
my_list = [1, 2, 3, 4, 2, 3, 4, 5]
counter = Counter(my_list)  # Automatic frequency countingcount = len(counter)        # Get the number of unique valuesprint(count)  # Output:5

Advanced usage:

# Get the 3 integers with the most occurrencesprint(counter.most_common(3))  # Output: [(2, 2), (3, 2), (4, 2)] 
# Mathematical operations (supports addition, subtraction, and union)counter2 = Counter([5,6,6,7])
print(counter + counter2)  # Merge statisticsprint(counter & counter2)  # Intersection statistics

Performance Features:

  • Time complexity: O(n) (equivalent to dictionary method)
  • Space complexity: O(n)
  • Advantages: Built-in rich statistical methods, the code is the simplest
  • Disadvantages: Need to import the standard library

5. Performance comparison and selection suggestions

method Time complexity Space complexity Applicable scenarios
Set deduplication method O(n) O(n) Just a simple count
Dictionary Counting Method O(n) O(n) Small and medium-sized data that requires frequency information
Counter class O(n) O(n) Large data that requires complex statistics

Selection suggestions:

  • Small data volume and no frequency information is required → Set deduplication method
  • Need frequency but medium data volume → Dictionary counting method
  • Professional data analysis/big data scenarios →

6. Practical cases: IP statistics in log analysis

Requirements: count the number of accesses to different IPs in the server log and find out the 10 IPs with the most frequent access.

from collections import Counter
 
# Simulate log data (each line contains IP address)log_lines = [
    "192.168.1.1 - - [timestamp] \"GET / HTTP/1.1\"",
    "10.0.0.5 - - [timestamp] \"POST /api\"",
    "192.168.1.1 - - [timestamp] \"GET /css/\"",
    # ...(Millions of log data)]
 
# Extract IP addressips = [()[0] for line in log_lines]
 
# Statistics and output resultsip_counter = Counter(ips)
print("Different IP count:", len(ip_counter))
print("Top10 IP:", ip_counter.most_common(10))

Code explanation:

  • Efficient extraction of IP addresses using list comprehension
  • Counter automatically processes millions of data statistics
  • Most_common(10) directly obtain high-frequency IP

7. Summary: The Counting Tool in the Intelligent Era

The problem of counting different integers seems simple, but in fact it contains multiple solutions. In the Python ecosystem:

  • Collections provide the most basic ability to deduplicate
  • The dictionary realizes the basic requirements of frequency statistics
  • It is a professional statistical tool

As the data size grows, it is particularly important to choose the data structure rationally. For developers in the intelligent era, mastering these counting methods is like having an accurate digital microscope that can efficiently understand the rules behind data.

The above is the detailed content of three methods for Python to count the occurrence of different integers. For more information on Python to count the occurrence of different integers, please pay attention to my other related articles!