Home » excel » Getting java.lang.OutOfMemoryError: GC overhead limit exceeded while trying to reader a big excel file(.xlsx)

Getting java.lang.OutOfMemoryError: GC overhead limit exceeded while trying to reader a big excel file(.xlsx)

Posted by: admin May 14, 2020 Leave a comment

Questions:

I am simply trying to read a big excel file(.xlsx) file, using the POI API and getting this exception.

This is the code I am using:

public static void main(String args[]) throws FileNotFoundException {
        File f = new File("E:\Downloads\6038_Open_AR.XLSX");
        if (f.exists()){
            System.out.println("Attempting");
            try {
                Workbook workBoo = WorkbookFactory.create(f);
                System.out.println("done");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

And this is the exception I get.

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at org.apache.xerces.xni.XMLString.toString(Unknown Source)
    at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
    at org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:137)
    at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:115)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:184)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:176)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.parseSheet(XSSFWorkbook.java:428)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:393)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:260)
    at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:263)
    at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:222)
    at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:201)
    at Main.main(Main.java:180)

I know this is a very popular topic, I could not find any direct solution that works.

Some of the things that I tried:

File f = new File("E:\Downloads\6038_Open_AR.XLSX");
File fileInputStream = new FileInputStream(f);
Workbook workBoo = new HSSFWorkbook(fileInputStream);

This gives the following exception

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
How to&Answers:

Without going into specific proprietary information that we used to solve this at my workplace, We used the Apache POI XSSF library to read in one sheet of the .XLSX file at a time. Our source file was 25MB and had 4 separate worksheets.

Apache POI will essentially unzip the XLSX, which expands its size by about 10 times (up to over 200MB in this case). Once unzipped, you have access to the Styles and SharedStrings. We used the XSSFSheetXMLHandler.SheetContentsHandler to then start processing each sheet individually.

We created an AbstractSheetParserService (which reads in the file via XMLReader), sets a ContentHandler using the XSSFSheetXMLHandler.SheetContentsHandler mentioned above, and finally parses the sheet and its values via the AbstractParserHandlerService.