Home » excel » Excel file overwritten instead of concat – Python – Pandas

Excel file overwritten instead of concat – Python – Pandas

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’m trying to contact all excel files and worksheets in them into one using the below script. It kinda works but then the excel file c.xlsx is overwritten per file, so only the last excel file is concated not sure why?

import pandas as pd
import os
import ntpath
import glob

dir_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(dir_path)
cdf = None
for excel_names in glob.glob('*.xlsx'):
    print(excel_names)
    df = pd.read_excel(excel_names, sheet_name=None, ignore_index=True)
    cdf = pd.concat(df.values())
    cdf.to_excel("c.xlsx", header=False, index=False)
How to&Answers:

Idea is create list of DataFrames in list comprehension, but because working with orderdict is necessary concat in loop and then again concat for one big final DataFrame:

cdf = [pd.read_excel(excel_names, sheet_name=None, ignore_index=True).values() 
       for excel_names in glob.glob('files/*.xlsx')]

df = pd.concat([pd.concat(x) for x in cdf], ignore_index=True)
#print (df)

df.to_excel("c.xlsx", index=False)

Answer:

I just tested the code below. It merges data from all Excel files in a folder into one, single, Excel file.

import pandas as pd
import numpy as np

import glob
glob.glob("C:\your_path\*.xlsx")

all_data = pd.DataFrame()
for f in glob.glob("C:\your_path\*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)
print(all_data)
df = pd.DataFrame(all_data)
df.shape
df.to_excel("C:\your_path\final.xlsx", sheet_name='Sheet1')

Answer:

I got it working using the below script which uses @ryguy72’s answer but works on all worksheets as well as the header row.

import pandas as pd
import numpy as np
import glob

all_data = pd.DataFrame()
for f in glob.glob("my_path/*.xlsx"):
    df = pd.read_excel(f, sheet_name=None, ignore_index=True)
    cdf = pd.concat(df.values())
    all_data = all_data.append(cdf,ignore_index=True)
print(all_data)
df = pd.DataFrame(all_data)
df.shape
df.to_excel("my_path/final.xlsx", sheet_name='Sheet1')