Home » excel » excel – Write 1 Million record from XLS file to CVS file in Java

excel – Write 1 Million record from XLS file to CVS file in Java

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have a scenario in my project where user uploads XLS file with 1Million record and I need to conver this xls file to csv file and then sql server job will process the csv file.

I have a process to convert xls file to csv – but when I deploy this code in PCF it starts throwing heap memory erorr. I am also getting the same error in my local environment.

Code Snapshot:

String inputFileName = "UserInput.xls";

FileInputStream input_document = new FileInputStream(new File(inputFileName));
Workbook my_xls_workbook = StreamingReader.builder().open(input_document);//this line throws out of memory error.
Sheet sheet = my_xls_workbook.getSheetAt(0);

…… remaing code reads the sheet object and perform conversion.

How to&Answers:

The method you’re using to load the file into a Workbook seems to be eager , i.e. it will just read the whole document into memory, parse it, and then return the result if it didn’t run out of memory first.

However, on the README page of the project, they show what you should be doing instead to avoid that:

import com.monitorjbl.xlsx.StreamingReader;

InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
Workbook workbook = StreamingReader.builder()
    .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
    .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
    .open(is);            // InputStream or File for XLSX file (required)

After this, iterate over the workbooks, but do not use the get(0) method as that seems to require loading all workbooks into memory at once.

for (Sheet sheet : workbook){
    System.out.println(sheet.getSheetName());
    for (Row r : sheet) {
        for (Cell c : r) {
            System.out.println(c.getStringCellValue());
        }
    }
}

Have you tried this? If you did, you should file a bug because clearly it shouldn’t use up all memory available as that’s the whole point of the library existing.

Answer:

It is always better to attach to the question a bit of context (e.g. how much memory you have, how much memory you give to the JVM, the error you get)

Having said that, my shot in the dark is that you should give more memory to JVM, something like

java .... -Xms4g -Xmx4g ...

See this question [1] for more details about those parameters.


[1] What are the -Xms and -Xmx parameters when starting JVM?

Answer:

Use the following code:

InputStream is = new FileInputStream(new File("UserInput.xls"));
Workbook workbook = StreamingReader.builder()
        .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
        .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
        .open(is);            // InputStream or File for XLSX file (required)

More information at https://github.com/monitorjbl/excel-streaming-reader

Answer:

use easypoi do

try {
        FileOutputStream fos    = new FileOutputStream("D:/home/excel/ExcelToCsv.test.csv");
        ImportParams     params = new ImportParams();
        params.setTitleRows(1);
        CsvExportParams csvExportParams = new CsvExportParams();
        IWriter ce = CsvExportUtil.exportCsv(csvExportParams, MsgClient.class, fos);
        ExcelImportUtil.importExcelBySax(
                new FileInputStream(
                        new File(FileUtilTest.getWebRootPath("import/BigDataExport.xlsx"))),
                MsgClient.class, params, new IReadHandler<MsgClient>() {

                    private List<MsgClient> list = new ArrayList<>();

                    @Override
                    public void handler(MsgClient o) {
                        list.add(o);
                        if (list.size() == 10000) {
                            ce.write(list);
                            list.clear();
                        }
                    }

                    @Override
                    public void doAfterAll() {
                        System.out.println("succcess--------------------------------");
                    }
                });
    } catch (Exception e) {

    }

address https://github.com/jueyue/easypoi