There are numerous questions about how to stop Excel from interpreting text as a number, or how to output number formats with openpyxl, but I haven’t seen any solutions to this problem:
I have an Excel spreadsheet given to me by someone else, so I did not create it. When I open the file with Excel, I have certain values like “5E12” (clone numbers, if anyone cares) that appear to display correctly, but there’s a little green arrow next to each one warning me that “This appears to be a number stored as text”. Excel then asks me if I would like to convert it to a number, and if I saw yes, I get 5000000000000, which then converts automatically to scientific notation and displays 5E12 again, only this time a text output would show the full number with zeroes. Note that before the conversion, this really is text, even to Excel, and I’m only being warned/offered to convert it.
So, when reading this file in with openpyxl (from openpyxl.reader.excel import load_workbook), the 5E12 is getting converted automatically to 5000000000000. I assume that openpyxl is making the same assumption that Excel made, only the conversion happens without a prompt or input on my part.
How can I prevent this from happening? I do not want text that look like “numbers stored as text” to convert to numbers. They are text unless I say so.
So far, the only solution I have found is to add single quotes to the front of each cell, but this is not an ideal solution, as it’s manual labor rather than a programmatic solution. Also, the solution needs to be general, since I don’t always know where this problem might occur (I’m reading millions of lines per day, so I don’t want to have to do anything by hand).
I think this is a problem with openpyxl. There is a google group discussion from the beginning of 2011 that mentions this problem, but assumes it’s too rare to matter. https://groups.google.com/forum/?fromgroups=#!topic/openpyxl-users/HZfpShMp8Tk
So, any suggestions?
If you want to use
openpyxl again (for whatever reason), the following changes to the worksheet reader routine do the trick of keeping the strings as strings:
diff –git a/openpyxl/reader/worksheet.py b/openpyxl/reader/worksheet.py
--- a/openpyxl/reader/worksheet.py +++ b/openpyxl/reader/worksheet.py @@ -134,8 +134,10 @@ data_type = element.get('t', 'n') if data_type == Cell.TYPE_STRING: value = string_table.get(int(value)) - - ws.cell(coordinate).value = value + ws.cell(coordinate).set_value_explicit(value=value, + data_type=Cell.TYPE_STRING) + else: + ws.cell(coordinate).value = value # to avoid memory exhaustion, clear the item after use element.clear()
Cell.value is a property and on assignment call
Cell._set_value, which then does a
Cell.bind_value which according to the method’s doc: “Given a value, infer type and display options”. As the types of the values are in the XML file those should be taken (here I only do that for strings) instead of doing something ‘smart’.
As you can see from the code, the test whether it is a string was already there.