Home » excel » java – Apache POI Read xlsx NPE

java – Apache POI Read xlsx NPE

Posted by: admin April 23, 2020 Leave a comment


I am trying to read an Excel Spreadsheet using the exact code in the question here. Everything works fine but sometimes I get an NPE on some cells on this line of code:

String value = cell.toString();

Why is that? How can some cell be null in the middle of my worksheet? Admittedly, the cells on which I get the NPE don’t contain data. But not all the empty cells cause NPEs.

Likewise (if I wrap this line in an NPE check) I will eventually get some rows that are apparently null:

XSSFRow row = ws.getRow(i);

Again, these are rows in the middle of my spreadsheet that contain no data. But not all empty rows cause NPEs.

Obviously performing a check for null in both of these cases solves my immediate problem. I’m just wondering why object are null sometimes. There must be some logic to it. I just don’t see it.


How to&Answers:

There doesn’t have to be a Row object for all rows. Think about it, when you start a new spreadsheet in Excel, you can have up to 1,048,576 rows, yet saving an empty spreadsheet results in a file size that is small. That is, references to rows that don’t exist would result in an absolutely huge file. References to rows should only be stored if there is some kind of content associated with them — any of cell values, formatting, borders, etc. A row may appear blank but have some formatting or maybe it used to have content that is now gone. There is a similar argument for Cells in a row. There’s no reason to have a Cell reference for cells that aren’t even used. But you can remove the content of a Cell and not have the Cell itself be removed; it can be a CELL_TYPE_BLANK cell.

If it never existed, then it will be null. Even if it has no content, it may have formatting that needs to be represented, so it won’t be null. If it used to have content or formatting, then it won’t be null unless someone explicitly deletes it, either in Excel with Right Click -> Delete or in POI with removeCell or removeRow.

If the row doesn’t have any content, then it makes sense that it could be null. As you have mentioned, you can always check the Row returned by getRow if it’s null before accessing it, and you can always check the Cell returned by getCell if it’s null before accessing it. You can also supply a Row.MissingCellPolicy to getCell to control that method’s behavior. CREATE_NULL_AS_BLANK will create the Cell for you if it didn’t already exist. Imagine having 16,384 Cells for a Row, where usually only a few at most are needed.

(There are other missing cell policies. RETURN_BLANK_AS_NULL does the opposite; if it exists but is blank, then null will be returned. The default, RETURN_NULL_AND_BLANK, just returns whatever is there without any other action.)