Home » excel » How to Merge Several Excel Sheets With Different Table Columns in Python?

How to Merge Several Excel Sheets With Different Table Columns in Python?

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have some excel sheets with different column as follows:

Table A: Col1 Col2 Col3

Table B: Col2 Col4 Col5

Table C: Col1 Col6 Col7

My Final Table should be like:

Table Final: Col1 Col2 Col3 Col4 Col5 Col6 Col7

Incase if there is no detail for a particular column, it should remain blank. I have successfully executed merging only two tables at a time, but I want to merge all the tables together.

This is the code that merges two sheets:

    import pandas as pd
    import numpy as np
    import glob
    df = pd.read_excel('C:/Users/Am/Downloads/sales-mar-2014.xlsx')
    status = pd.read_excel('C:/Users/Am/Downloads/customer-status.xlsx')
    all_data_st = pd.merge(df, status, how='outer') 
    all_data_st.to_excel('C:/Users/Am/Downloads/a1.xlsx',header=True)

This is a code i have written for merging more than two sheets:

    import pandas as pd
    import numpy as np
    import glob
    all_data = pd.DataFrame()
    for f in glob.glob(‘C:/Users/Am/Downloads/*.xlsx’):
    all_data = all_data.merge(pd.read_excel(f), how='outer')
    writer = pd.ExcelWriter('merged.xlsx', engine='xlsxwriter')
    all_data.to_excel(writer,sheet_name='Sheet1')
    writer.save()

This is the error i am getting:

Traceback (most recent call last):
  File "E:/allfile.py", line 7, in <module>
    all_data = all_data.merge(pd.read_excel(f), how='outer')
  File "C:\Users\Am\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 6868, in merge
    copy=copy, indicator=indicator, validate=validate)
  File "C:\Users\Am\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\merge.py", line 47, in merge
    validate=validate)
  File "C:\Users\Am\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\merge.py", line 524, in __init__
    self._validate_specification()
  File "C:\Users\Am\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\merge.py", line 1033, in _validate_specification
    lidx=self.left_index, ridx=self.right_index))
pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False
How to&Answers:

You can do this by below given sample code. The below given code is about to merge three .xlsx files with your stated columns. But if you are having more than three files and having known columns on which you want to merge these many tables data then you have to put this code in a function. This function should take two datasets and a merge column name as inputs and in return it gives you a merged dataset. You can iterate over list of excels files and call this function to get a final merged dataset.

Here, is the sample code:

import pandas as pd
data_A = pd.read_excel('a.xlsx')
data_B = pd.read_excel('b.xlsx')
data_C = pd.read_excel('c.xlsx')
print("File A Data:")
print(data_A)
print("File B Data:")
print(data_B)
print("File C Data:")
print(data_C)

data_AB = pd.merge(left=data_A, right=data_B, on='Col2', how='outer')
data_ABC = pd.merge(left=data_AB, right=data_C, on='Col1', how='outer')
print("Merged Data:")
print(data_ABC)

Output will be a merged data of all three tables with all columns.
Hope, this may helps you to solve your problem.

Answer:

the code for two sheets is also not working, right? the argument is missing, I would recommend to save the different types of excel sheets in a new folder and then create one file for each type of excel sheet, based on the following help:
Loading multiple csv files of a folder into one dataframe

then you can run the merge:

 all_data_st = pd.merge(A, B, how='outer', on='Col2')
 all_data_st = pd.merge(all_data_st, C, how='outer', on='Col1')

alternativ try to run concat:

all_data = pd.DataFrame()
for f in glob.glob(‘C:/Users/Am/Downloads/*.xlsx’):
  df = pd.read_excel(f)
  all_data = pd.concat([all_data,df], axis=0, ignore_index=True)