SoFunction
Updated on 2024-10-30

How python can quickly find the difference between the data in two spreadsheets

I've just recently been introduced to python, and I'm looking for small tasks to practice, and I hope that I'll continue to hone my problem-solving skills in practice.

There will be such a scenario in the company: there is a spreadsheet content by two or three departments or more departments to use, these employees will be in the maintenance of these forms from time to time to keep up with some of their own departmental data, over time, everyone's data began to fight, very unfavorable to the management. How to quickly find two or more spreadsheets in the data differences?

Solution:

1. Excel comes with the method (interested in their own Baidu)

2. python write a small script

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# Import module openpyxl
import openpyxl
from  import PatternFill
from  import colors
from  import Font, Color

#Read excel file
# The string in parentheses is the path to the two excels you want to compare, note the "/"
wb_a = openpyxl.load_workbook('d:/BAKFILE/d046532/Desktop/check excel/')
wb_b = openpyxl.load_workbook('d:/BAKFILE/d046532/Desktop/check excel/')
# Define a method to get the contents of a column in a table and return a list
# Here in my table: IP is unique, so I use it to differentiate between the data, and the IP column is column "G" in my table.
def getIP(wb):
  sheet = wb.get_active_sheet()
  ip = []
  for cellobj in sheet['G']:
    ()

  return ip
#Get ip list
ip_a = getIP(wb_a)
ip_b = getIP(wb_b)
# Convert two lists into a collection
aa = set(ip_a)
bb = set(ip_b)
# Find the different rows of two lists and convert them to lists
difference = list(aa ^ bb)
# Print out the elements of the list
# By this point, the data that is different in the two tables has been found out
for i in difference:
  print (i)

# Highlight different rows
print ("Start the first table." + "----" *10)
a = wb_a.get_active_sheet()['G']
for cellobj in a:
  if  in difference:
    print ()
     = Font(color=, italic=True ,bold = True)
     = PatternFill("solid", fgColor="DDDDDD")
print ("Start the second table." + "----" *10)
b = wb_b.get_active_sheet()['G']
for cellobj in b:
  if  in difference:
    print ()
     = Font(color=, italic=True ,bold = True)
     = PatternFill("solid", fgColor="DDDDDD")

wb_a.save('d:/BAKFILE/d046532/Desktop/')
wb_b.save('d:/BAKFILE/d046532/Desktop/')

In this way, two copies of excel will be saved, and in this copy the different data differences in the two tables will be marked with cell fill color and font color

Unresolved:

1. How to add these disparate data to a table to form a full set of tables

2. How to optimize and streamline code

This above python how to quickly find out the difference between the data in the two spreadsheets is all that I have shared with you, I hope it will give you a reference, and I hope you will support me more.