I have an Excel file with merge header that I read as dataframe using pandas. It looks like this after pd.read_excel()
:
Unnamed: 0 Pair Unnamed: 1 Type ... Unnamed: 23
cabinet_name group pair caller_id ... result
value1 value1 value1 value1 ... value1
value2 value2 value2 value2 ... value2
So it’s like I have two header rows. One is the row with Unnamed and the other is my desired header row.
This is my desired output:
cabinet_name group pair caller_id ... result
value1 value1 value1 value1 ... value1
value2 value2 value2 value2 ... value2
I am trying to remove the row with Unnamed
:
df.drop(df.index[[0]])
and also using header=None
in pd.read_excel('file.xlsx, header=None)'
But all of what I found did not return my expected output. I searched on how to delete rows with Unnamed
but all I found was deleting columns.
I also tried
df.drop(df.head(0))
but it returned me:
KeyError: '[\'Unnamed: 0\' \'Pair'\ ... \'Unnamed: 23\']'
Any best way to do it?
I believe you need skip first row by parameters skiprows=1
or header=1
and then remove all only NaN
s columns:
df = (pd.read_excel('UF_AGT702-M.xlsx', skiprows=2, sheetname='Report')
.dropna(how='all', axis=1))
Answer:
Let’s take for instance the excel file layout bellow.
To exclude the footer and header information from the datafile you could use the header/skiprows parameter for the former and skipfooter for the later. Here is a MWE for its use it:
import pandas as pd
energy = pd.read_excel('your_excel_file.xls', header=9, skipfooter=8)
header : int, list of int, default 0
Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.skipfooter : list-like
Rows at the end to skip (0-indexed).
Check out latest read_excel documentation for further details.
Tags: excel, excelpython, pandas