SoFunction
Updated on 2024-10-28

Three Ways Python Compares Two CSV Files and Prints the Differences

In this article, we will discuss various ways to compare two CSV files. We will include the most "Pythonic" way to do this and external Python modules that can help simplify this task.

Finally, we'll include a way to recognize differences in CSV files using Pandas DataFrames.

Let's assume that the two CSV files to be compared are titled and . You can rename the files as needed.

Please also replace the filename appropriately in the code snippet given below.

For example purposes, our files are set up as follows:

:

1,2,3,4,5,6
4,5,6,7,8,9
1,3,4,5,6,1

:

1,2,3,4,5,6
4,5,6,7,8,9
2,3,1,4,1,5

Method 1: Compare two CSV files using the most Pythonic solution available

In this method, we read the contents of the file into two lists, traverse one of the lists and check if each line exists in the second list. Logically, this is a very simple solution.

Python's potential efficiency makes this comparison quite valid, despite what it looks like.

with open('', 'r') as file1, open('', 'r') as file2:
    f1_contents = ()
    f2_contents = ()
for line in f1_contents:
    if line not in f2_contents:
        print(line)
for line in f2_contents:
    if line not in f1_contents:
        print(line)

The code snippet above will print a different line to your terminal.

In our test case, we get the following output.

1,3,4,5,6,1

2,3,1,4,1,5

Method 2: Compare two CSV files using csv-diff - external module

First, install the module using the following command in the terminal.

python3 -m pip install csv-diff

Once installed, you do not need to write a Python script. You can run it directly from the terminal using the following command.

csv-diff   --key=id

Running this command will display the differences on your terminal.

In our test case, we get the following output.

1 row added, 1 row removed

1 row added

1: 2
2: 3
3: 1
4: 4
5: 1
6: 5

1 row removed

1: 1
2: 3
3: 4
4: 5
5: 6
6: 1

To use this module as part of a Python script, you can write a script similar to the following.

from csv_diff import load_csv, compare
difference = compare(
    load_csv(open("")),
    load_csv(open(""))
)
print(difference)

The output is as follows.

{'added': [{'1': '2', '2': '3', '3': '1', '4': '4', '5': '1', '6': '5'}], 'removed': [{'1': '1', '2': '3', '3': '4', '4': '5', '5': '6', '6': '1'}], 'changed': [], 'columns_added': [], 'columns_removed': []}

Method 3: Compare Two CSV Files Using Pandas DataFrames

The following script performs this task for you.

import pandas as pd
import sys
import csv
def dataframe_difference(df1: , df2: , which=None):
    comparison_df = (
        df2,
        indicator=True,
        how='outer'
    )
    if which is None:
        diff_df = comparison_df[comparison_df['_merge'] != 'both']
    else:
        diff_df = comparison_df[comparison_df['_merge'] == which]
    return diff_df
if __name__ == "__main__":
    df1 = pd.read_csv("", header=None)
    df2 = pd.read_csv("", header=None)
    print(dataframe_difference(df1, df2))

Note that the parameter header=None is entered in the read_csv method because our test file does not have any header. If your file has a header, you can read it using the following method:pd.read_csv("")The file will be replaced by the one in which you are working.

If your files are not in the same directory as the script, please provide the full path to the CSV file.

The Python script above should produce the following output:

0 1 2 3 4 5 _merge
2 1 3 4 5 6 1 left_only
3 2 3 1 4 1 5 right_only

The lines next to left_only and right_only contain all the differences._merge The lines next to it indicate indexes only.

This article on Python to compare two CSV files and print the difference between the article is introduced to this, more related to Python to compare two CSV files, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future more!