Home » excel » java – How to parse UTF-8 characters in Excel files using POI

java – How to parse UTF-8 characters in Excel files using POI

Posted by: admin March 9, 2020 Leave a comment


I have been using POI to parse XLS and XLSX files successfully. However, I am unable to correctly extract special characters, such as UTF-8 encoded characters like Chinese or Japanese, from an Excel spreadsheet. I have figured out how to extract data from a UTF-8 encoded csv or tab delimited file, but no luck with the Excel file. Can anyone help?

(Edit: Code snippet from comments)

HSSFSheet sheet = workbook.getSheet(worksheet); 
HSSFEvaluationWorkbook ewb = HSSFEvaluationWorkbook.create(workbook); 
while (rowCtr <= lastRow && !rowBreakOut) 
    Row row = sheet.getRow(rowCtr);//rows.next(); 
    for (int col=firstCell; col<lastCell && !breakOut; col++) { 
      Cell cell; 
      cell = row.getCell(col,Row.RETURN_BLANK_AS_NULL); 
      if (ctype == Cell.CELL_TYPE_STRING) { 
         sValue = cell.getStringCellValue(); 
         log.warn("String value = "+sValue); 
         String encoded = URLEncoder.encode(sValue, "UTF-8"); 
         log.warn("URL-encoded with UTF-8: " + encoded); 
How to&Answers:

I had the same problem while extracting Persian text from an Excel file. I was using Eclipse, and simply going to Project -> Properties and changing the “text file encoding” to UTF-8 solved the problem.


in POI you can use like this:

Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("new sheet");

// Create a row and put some cells in it. Rows are 0 based.
Row row = sheet.createRow(1);

// Create a new font and alter it.
Font font = wb.createFont();
font.setFontName("B Nazanin");

// Fonts are set into a style so create a new one to use.
CellStyle style = wb.createCellStyle();

// Create a cell and put a value in it.
Cell cell = row.createCell(1);

// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");

and can use another charset in FontCharset


Get bytes using UTF as follows



The solution is simple, to read cell string values of any encoding (non English characters); just use the following method:

sValue = cell.getRichStringCellValue().getString();

instead of:

sValue = cell.getStringCellValue();

This applies to UTF-8 encoded characters like Chinese, Arabic or Japanese.

P.S if anybody is using the Command line utility nullpunkt/excel-to-json which utilize the “Apache POI” library, modify the file converter/ExcelToJsonConverter.java by replacing the occurrences of “getStringCellValue()” to avoid reading non-english characters as “???”.