Home » excel » python 3.x – Remove header row in Excel using pandas

python 3.x – Remove header row in Excel using pandas

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have an Excel file with merge header that I read as dataframe using pandas. It looks like this after pd.read_excel():

Unnamed: 0     Pair    Unnamed: 1      Type      ...  Unnamed: 23
cabinet_name   group     pair          caller_id ...  result
value1         value1    value1        value1    ...  value1
value2         value2    value2        value2    ...  value2

So it’s like I have two header rows. One is the row with Unnamed and the other is my desired header row.

This is my desired output:

cabinet_name   group     pair          caller_id ...  result
value1         value1    value1        value1    ...  value1
value2         value2    value2        value2    ...  value2

I am trying to remove the row with Unnamed:

df.drop(df.index[[0]])

and also using header=None in pd.read_excel('file.xlsx, header=None)'

But all of what I found did not return my expected output. I searched on how to delete rows with Unnamed but all I found was deleting columns.

I also tried

df.drop(df.head(0))

but it returned me:

KeyError: '[\'Unnamed: 0\' \'Pair'\ ... \'Unnamed: 23\']'

Any best way to do it?

How to&Answers:

I believe you need skip first row by parameters skiprows=1 or header=1 and then remove all only NaNs columns:

df = (pd.read_excel('UF_AGT702-M.xlsx', skiprows=2, sheetname='Report')
        .dropna(how='all', axis=1))

Answer:

Let’s take for instance the excel file layout bellow.

enter image description here

To exclude the footer and header information from the datafile you could use the header/skiprows parameter for the former and skipfooter for the later. Here is a MWE for its use it:

import pandas as pd

energy = pd.read_excel('your_excel_file.xls', header=9, skipfooter=8)

header : int, list of int, default 0
Row (0-indexed) to use for the column labels of the parsed DataFrame. If a list of integers is passed those row positions will be combined into a MultiIndex. Use None if there is no header.

skipfooter : list-like
Rows at the end to skip (0-indexed).

Check out latest read_excel documentation for further details.