I’m reading an XLSX (Microsoft Excel XML file) using the Excel Data Reader from http://exceldatareader.codeplex.com/ and am having a problem with missing data. Data which is in the source Excel spreadsheet is missing from the data set returned by the library.
Here’s a bit more detail of what I’m doing:
- Created a simple test spreadsheet in Excel with one sheet, a header row and two data rows. Save and close Excel.
- Open the file and pass the stream into the CreateOpenXmlReader() method and get back an IExcelDataReader.
- Call the AsDataSet() method on the IExcelDataReader and get back a DataSet.
- Get the ItemArray from row 1 of table 0.
- Loop through the ItemArray. Discovered there is data missing (i.e. there are System.DBNull members where I expected System.string members).
Here’s a bit more analysis…
I debugged the code and looked inside the ExcelDataReader object model. Found a non-public string array called “SST” which appears to contain the data from the spreadsheet as a single linear (one-dimensional) array.
On closer inspection, I found that the data I was looking for was also missing from this array. In this raw data, the member does not exist at all.
My guess is that for some reason the parser is not picking up the data from the OOXML and concluding that the cell is empty. Looking at the OOXML itself, the data seems to be split across the sharedStrings.xml and sheet1.xml files, so perhaps the parser is having a tough time putting all this together.
Saving the file in binary format (Excel 97 to 2003) and reading that in solves the problem so on the surface that seems to confirm my suspicion is with reading the OOXML format.
As a stop gap I’m converting all files to binary format, but that seems like a kludge. Is there some way to get my OOXML formatted Excel files to read in properly with Excel Data Reader?
To retrieve an Excel spreadsheet (.xlsx) and load it into a DataSet, you don’t need to mess with XML readers or a separate library like Excel Data Reader. The code for reading an entire spreadsheet into a DataSet is pretty simple when using the normal OleDb functions in .NET:
Sub readInMyExcelFile Dim xlsFile as string = "myexcelfile" Dim conStr As String = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & xlsFile & ";Extended Properties=""Excel 12.0 Xml;HDR=YES""" Dim dtSheets As New DataTable Dim tmp As String Dim sqlText as Sting Using cn As New OleDbConnection(conStr) cn.Open() dtSheets = cn.GetSchema("Tables") End Using //Dataset for the spreadsheet Dim ds as New DataSet /Loop through the names of all the worksheets in the file. For Each rw as DataRow in dtSheets.Rows tmp = rw("TABLE_NAME") tmp = "[" & tmp & "]" Dim dt as New DataTable Using cn as New OleDbConnection(conStr) cn.Open /Retrieve all the records from the worksheet. sqlText = "SELECT * FROM " & tblName Dim adp As New OleDbDataAdapter(sqlText, cn) /Fill the data table with the all the data. adp.Fill(dt) End Using ds.Tables.Add(dt) Next End Sub
It seems there is a bug in Excel Data Reader (it is first time I have heard about it). Do you have to use it? If not, EPPlus would be a better choice.
excel datareader from codeplex is used for reading data from the excel file directly on web application without any sort of caching on the server.the above code only stands when we can store the excel file somewhere.I have faced similar problems with exceldatareader where some of the data are missing.Most importanly i coludnt find any specific trend.All i cud see that if all the rows have values then there is no problem. Best chance is to convert xlsx to xls.