I have some excel sheets with different column as follows:
Table A: Col1 Col2 Col3
Table B: Col2 Col4 Col5
Table C: Col1 Col6 Col7
My Final Table should be like:
Table Final: Col1 Col2 Col3 Col4 Col5 Col6 Col7
Incase if there is no detail for a particular column, it should remain blank. I have successfully executed merging only two tables at a time, but I want to merge all the tables together.
This is the code that merges two sheets:
import pandas as pd import numpy as np import glob df = pd.read_excel('C:/Users/Am/Downloads/sales-mar-2014.xlsx') status = pd.read_excel('C:/Users/Am/Downloads/customer-status.xlsx') all_data_st = pd.merge(df, status, how='outer') all_data_st.to_excel('C:/Users/Am/Downloads/a1.xlsx',header=True)
This is a code i have written for merging more than two sheets:
import pandas as pd import numpy as np import glob all_data = pd.DataFrame() for f in glob.glob(‘C:/Users/Am/Downloads/*.xlsx’): all_data = all_data.merge(pd.read_excel(f), how='outer') writer = pd.ExcelWriter('merged.xlsx', engine='xlsxwriter') all_data.to_excel(writer,sheet_name='Sheet1') writer.save()
This is the error i am getting:
Traceback (most recent call last): File "E:/allfile.py", line 7, in <module> all_data = all_data.merge(pd.read_excel(f), how='outer') File "C:\Users\Am\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 6868, in merge copy=copy, indicator=indicator, validate=validate) File "C:\Users\Am\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\merge.py", line 47, in merge validate=validate) File "C:\Users\Am\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\merge.py", line 524, in __init__ self._validate_specification() File "C:\Users\Am\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\merge.py", line 1033, in _validate_specification lidx=self.left_index, ridx=self.right_index)) pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False
You can do this by below given sample code. The below given code is about to merge three .xlsx files with your stated columns. But if you are having more than three files and having known columns on which you want to merge these many tables data then you have to put this code in a function. This function should take two datasets and a merge column name as inputs and in return it gives you a merged dataset. You can iterate over list of excels files and call this function to get a final merged dataset.
Here, is the sample code:
import pandas as pd data_A = pd.read_excel('a.xlsx') data_B = pd.read_excel('b.xlsx') data_C = pd.read_excel('c.xlsx') print("File A Data:") print(data_A) print("File B Data:") print(data_B) print("File C Data:") print(data_C) data_AB = pd.merge(left=data_A, right=data_B, on='Col2', how='outer') data_ABC = pd.merge(left=data_AB, right=data_C, on='Col1', how='outer') print("Merged Data:") print(data_ABC)
Output will be a merged data of all three tables with all columns.
Hope, this may helps you to solve your problem.
the code for two sheets is also not working, right? the argument is missing, I would recommend to save the different types of excel sheets in a new folder and then create one file for each type of excel sheet, based on the following help:
Loading multiple csv files of a folder into one dataframe
then you can run the merge:
all_data_st = pd.merge(A, B, how='outer', on='Col2') all_data_st = pd.merge(all_data_st, C, how='outer', on='Col1')
alternativ try to run concat:
all_data = pd.DataFrame() for f in glob.glob(‘C:/Users/Am/Downloads/*.xlsx’): df = pd.read_excel(f) all_data = pd.concat([all_data,df], axis=0, ignore_index=True)