Is there a memory-efficient Java library to read large Microsoft Excel files (both .xls and .xlsx)? I have very limited experience with Apache POI, and it seemed to be a huge memory hog from what I recall (though perhaps this was just for writing and not for reading). Is there something better? Or am I misremembering and/or misusing POI?
It would be important for it to have a “friendly” open-source license as well.
Apache’s POI library has an event-based API that has a smaller memory-footprint. Unfortunately, it only works with HSSF (Horrible Spreadsheet Format) and not XSSF (XML Spreadsheet Format – for OOXML files).
The Excel file formats are (both) huge and extremely complicated, and anything that reads all of their possible contents is going to be equally huge and complicated. Remember they can contain ranges, macros, links, embedded stuff etc.
However if you are reading something simple like a grid of numbers, I recommend first converting the spreadsheet to something simpler like CSV and then reading that format.
Take a look at JExcel:
I can’t account for the memory footprint, but obviously with large spreadsheets your going to consume lots of memory for processing.
You should be able to use it for xls and xlsx: