Home » excel » Excel column comparison using Python

Excel column comparison using Python

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have a excel file in which there are some columns.

COL 1    | COL 2    | COL 3  

ABCD     |  ABC(D)  |   CDA  
AB CD    | ABC D    |   C D - (B)  
A B C D  | (ABCD)   |   ABCD  
ABC D    | ABDC     | ABC D  
A(BC ) D |  AD B - C|   AB CD

I want to compare every column with every other column and want to print similarities and differences between columns.

for example :

  1. comparing COL 1 and COL 2

    similarities :

    None
    

    differences :

    ABCD
    AB CD
    A B C D
    A(BC ) D
    ABC(D)
    ABC D
    (ABCD)
    ABDC
    AD B - C
    

then comparing COL 2 and COL 3 and then comparing COL 1 and COL 3.
Need only exact string match, even a whitespace considered as mismatch.
It may be possible that column number may increase and comparison starts from 2nd row of the column.

How can I implement such recursive comparison in Python which gives me fast processing output?

How to&Answers:

You can use xlrd. First of all, read content from your file. Second, save three columns into three dictionaries, since dict works faster in comparison. Third, do comparison work and output the result.

I suggest you check API of xlrd and write code by yourself. Here is link.

Any questions, feel free to ask.

EDIT:

Here is an example.

#!/usr/bin/python
#-*- coding:utf-8 -*-

name = {1:'a', 2:'b', 3:'c'}
lname = {1:'g', 2:'b', 3:'v'}
common = {}
diff_name   = {}
diff_lname  = {}


for key in name.keys():
    if name[key] == lname[key]:
        common[key] = name[key]
    else:
        diff_name[key] = name[key]
        diff_lname[key] = lname[key]

print 'common part is:', common
print 'diff_name  is: ', diff_name
print 'diff_lname  is: ', diff_lname

Answer:

An algorithm might be

for colA in range(0, N):
     for colB in range (colA + 1, N - 1):
        compare(colA, colB)