Home » excel » Reading Excel file to Python fails due to number formatted as text

Reading Excel file to Python fails due to number formatted as text

Posted by: admin April 23, 2020 Leave a comment

Questions:

I have a ton of Excel files that each have a column where numbers are formatted as text. Excel gives the error “the number in this cell is formatted as text or preceded by an apostrophe” – see the third column where the cells have a green triangle.

enter image description here

My goal is to open all of these files in Pandas without having to manually open each of them and convert the column to number. However, pd.read_excel() fails with the following xlrd error:

XLRDError: ZIP file contents not a known type of workbook

Unsurprisingly when I use xlrd directly: wb = xlrd.open_workbook(filename) I get the same error.

I also tried openpyxl: wb = openpyxl.load_workbook(filename), and it gives me this:

KeyError: "There is no item named 'xl/_rels/workbook.xml.rels' in the archive"

I confirmed that the file is openable by both pandas (xlrd) and openpyxl if I manually convert the column to number in excel and re-save the workbook.

Does anyone have any ideas?

How to&Answers:

Use “converters” while reading excel.
Docs.

Eg.:

df = pd.read_excel('yourfile.xlsx',sheetname='sheetname',header=0,converters={ "% Chg" : str })