SoFunction
Updated on 2025-03-02

Python search for a method that specifies a file with the same content

This article example describes how python finds a specified file with the same content. Share it for your reference. The details are as follows:

Python code is used to find files that specify the same content, and can specify multiple directories at the same time.
Call method: python c:\;d:\;e:\ >

# Hello, this script is written in Python - 
#  1.0p
import os, , string, sys, sha
message = """
 1.0p
This script will search for files that are identical
(whatever their name/date/time).
 Syntax : python %s <directories>
   where <directories> is a directory or a list of directories
   separated by a semicolon (;)
Examples : python %s c:\windows
      python %s c:\;d:\;e:\ > 
      python %s c:\program files > 
This script is public domain. Feel free to reuse and tweak it.
The author of this script Sebastien SAUVAGE <sebsauvage at sebsauvage dot net>
/python/
""" % (([0], )*4)
def fileSHA ( filepath ) :
  """ Compute SHA (Secure Hash Algorythm) of a file.
    Input : filepath : full path and name of file (eg. 'c:\windows\')
    Output : string : contains the hexadecimal representation of the SHA of the file.
             returns '0' if file could not be read (file not found, no read rights...)
  """
  try:
    file = open(filepath,'rb')
    digest = ()
    data = (65536)
    while len(data) != 0:
      (data)
      data = (65536)
    ()
  except:
    return '0'
  else:
    return ()
def detectDoubles( directories ):
  fileslist = {}
  # Group all files by size (in the fileslist dictionnary)
  for directory in (';'):
    directory = (directory)
    ('Scanning directory '+directory+'...')
    (directory,callback,fileslist)
    ('\n')
  ('Comparing files...')
  # Remove keys (filesize) in the dictionnary which have only 1 file
  for (filesize,listoffiles) in ():
    if len(listoffiles) == 1:
      del fileslist[filesize]
  # Now compute SHA of files that have the same size,
  # and group files by SHA (in the filessha dictionnary)
  filessha = {}
  while len(fileslist)>0:
    (filesize,listoffiles) = ()
    for filepath in listoffiles:
      ('.')
      sha = fileSHA(filepath)
      if filessha.has_key(sha):
        filessha[sha].append(filepath)
      else:
        filessha[sha] = [filepath]
  if filessha.has_key('0'):
    del filessha['0']
  # Remove keys (sha) in the dictionnary which have only 1 file
  for (sha,listoffiles) in ():
    if len(listoffiles) == 1:
      del filessha[sha]
  ('\n')
  return filessha
def callback(fileslist,directory,files):
  ('.')
  for fileName in files:
    filepath = (directory,fileName)
    if (filepath):
      filesize = (filepath)[6]
      if fileslist.has_key(filesize):
        fileslist[filesize].append(filepath)
      else:
        fileslist[filesize] = [filepath]
if len()>1 :
  doubles = detectDoubles(" ".join([1:]))
  print 'The following files are identical:'
  print '\n'.join(["----\n%s" % '\n'.join(doubles[filesha]) for filesha in ()])
  print '----'
else:
  print message

I hope this article will be helpful to everyone's Python programming.