Home » excel » python – Pandas read _excel: 'utf-8' codec can't decode byte 0xa8 in position 14: invalid start byte

python – Pandas read _excel: 'utf-8' codec can't decode byte 0xa8 in position 14: invalid start byte

Posted by: admin March 9, 2020 Leave a comment

Questions:

Trying to read MS Excel file, version 2016. File contains several lists with data. File downloaded from DataBase and it can be opened in MS Office correctly. In example below I changed the file name.

EDIT: file contains russian and english words. Most probably used the Latin-1 encoding, but encoding='latin-1' does not help

import pandas as pd
with open('1.xlsx', 'r', encoding='utf8') as f:
        data = pd.read_excel(f)

Result:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 14: invalid start byte

Without encoding ='utf8'

'charmap' codec can't decode byte 0x9d in position 622: character maps to <undefined>

P.S. Task is to process 52 files, to merge data in every sheet with corresponded sheets in the 52 files. So, please no handle work advices.

How to&Answers:

Most probably the problem is in Russian symbols.

Charmap is default decoding method used in case no encoding is beeing noticed.

As I see if utf-8 and latin-1 do not help then try to read this file not as

pd.read_excel(f)

but

pd.read_table(f)

or even just

f.readline()

in order to check what is a symbol raise an exeception and delete this symbol/symbols.

Answer:

The problem is that the original requester is calling read_excel with a filehandle as the first argument. As demonstrated by the last responder, the first argument should be a string containing the filename.

I ran into this same error using:

df = pd.read_excel(open("file.xlsx",'r'))

but correct is:

df = pd.read_excel("file.xlsx")

Answer:

Panda support encoding feature to read your excel
In your case you can use:

df=pd.read_excel('your_file.xlsx',encoding='utf-8')

or if you want in more of system specific without any surpise you can use:

df=pd.read_excel('your_file.xlsx',encoding='sys.getfilesystemencoding()')