I have a VERY large CSV file with 250,000+ records that takes a while to do any analyses on in Excel, so I wanted to splice it into multiple worksheets based on a specific calculated column that I created in pandas.
The specific column is called “Period” and is a string variable in my dataframe in the form of MMM_YYYY (e.g., Jan_2016, Feb_2016, etc.)
I am trying to make something that would have a workbook (let’s call it data_by_month.xlsx) have a worksheet for every unique period in the dataframe column “Period,” with all matching rows written into the respective worksheet.
This is the logic that I tried:
for row in df: for period in unique_periods: if row == period: with pd.ExcelWriter("data_by_month.xslx") as writer: df.to_excel(writer, sheet_name = period)
The idea behind this is for every row in the dataframe, go through every period in a list of unique periods, and if the row — which is the index of Period — is equal to a period, write it into the data_by_month.xlsx workbook into a specific worksheet.
I know that my code is completely incorrect right now, but it’s the general logic that I’ve been trying to implement. I’m pretty sure I’m referring to the location of the “Period” column in the dataframe incorrectly, since it keeps saying it’s out of range. Any advice would be welcome!
Thank you so much!
You should be able to achieve this using a groupby in pandas. For example …
with pd.ExcelWriter("data_by_month.xlsx") as writer: for period, data in df.groupby('Period'): data.to_excel(writer, sheet_name = period)