I have an Excel workbook with many tabs.
Each tab has the same set of headers as all others.
I want to combine all of the data from each tab into one data frame (without repeating the headers for each tab).
So far, I’ve tried:
import pandas as pd xl = pd.ExcelFile('file.xlsx') df = xl.parse()
Can use something for the parse argument that will mean “all spreadsheets”?
Or is this the wrong approach?
Thanks in advance!
Update: I tried:
a=xl.sheet_names b = pd.DataFrame() for i in a: b.append(xl.parse(i)) b
But it’s not “working”.
This is one way to do it — load all sheets into a dictionary of dataframes and then concatenate all the values in the dictionary into one dataframe.
import pandas as pd
Set sheetname to None in order to load all sheets into a dict of dataframes
and ignore index to avoid overlapping values later (see comment by @bunji)
df = pd.read_excel('tmp.xlsx', sheetname=None, ignore_index=True)
Then concatenate all dataframes
cdf = pd.concat(df.values()) print(cdf)