Home » excel » How do you use the OpenXML API to read a Table from an Excel spreadsheet?

How do you use the OpenXML API to read a Table from an Excel spreadsheet?

Posted by: admin April 2, 2020 Leave a comment

Questions:

I’ve read a bunch of stuff on the web about how to get at cell data using the OpenXML API. But there’s really not much out there that’s particularly straightforward. Most seems to be about writing to SpreadsheetML, not reading… but even that doesn’t help much.
I’ve got a spreadsheet that has a table in it. I know what the table name is, and I can find out what sheet it’s on, and what columns are in the table. But I can’t figure out how to get a collection of rows back that contain the data in the table.

I’ve got this to load the document and get a handle to the workbook:

SpreadsheetDocument document = SpreadsheetDocument.Open("file.xlsx", false);
WorkbookPart workbook = document.WorkbookPart;

I’ve got this to find the table/sheet:

Table table = null;
foreach (Sheet sheet in workbook.Workbook.GetFirstChild<Sheets>())
{
    WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheet.Id);
    foreach (TableDefinitionPart tableDefinitionPart in worksheetPart.TableDefinitionParts)
    {
        if (tableDefinitionPart.Table.DisplayName == this._tableName)
        {
            table = tableDefinitionPart.Table;
            break;
        }
    }
}

And I can iterate over the columns in the table by foreaching over table.TableColumns.

How to&Answers:

To read an Excel 2007/2010 spreadsheet with OpenXML API is really easy. Somehow even simpler than using OleDB as we always did as quick & dirty solution. Moreover it’s not just simple but verbose, I think to put all the code here isn’t useful if it has to be commented and explained too so I’ll write just a summary and I’ll link a good article. Read this article on MSDN, it explain how to read XLSX documents in a very easy way.

Just to summarize you’ll do this:

  • Open the SpreadsheetDocument with SpreadsheetDocument.Open.
  • Get the Sheet you need with a LINQ query from the WorkbookPart of the document.
  • Get (finally!) the WorksheetPart (the object you need) using the Id of the Sheet.

In code, stripping comments and error handling:

using (SpreadsheetDocument document = SpreadsheetDocument.Open(fileName, false))
{
   Sheet sheet = document.WorkbookPart.Workbook
      .Descendants<Sheet>()
      .Where(s => s.Name == sheetName)
      .FirstOrDefault();

   WorksheetPart sheetPart = 
      (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
}

Now (but inside the using!) what you have to do is just to read a cell value:

Cell cell = sheetPart.Worksheet.Descendants<Cell>().
    Where(c => c.CellReference == addressName).FirstOrDefault();

If you have to enumerate the rows (and they are a lot) you have first to obtain a reference to the SheetData object:

SheetData sheetData = sheetPart.Worksheet.Elements<SheetData>().First();

Now you can ask for all the rows and cells:

foreach (Row row in sheetData.Elements<Row>())
{
   foreach (Cell cell in row.Elements<Cell>())
   {
      string text = cell.CellValue.Text;
      // Do something with the cell value
   }
}

To simply enumerate a normal spreadsheet you can use Descendants<Row>() of the WorksheetPart object.

If you need more resources about OpenXML take a look at OpenXML Developer, it contains a lot of good tutorials.

Answer:

There are probably many better ways to code this up, but I slapped this together because I needed it, so hopefully it will help some others.

using DocumentFormat.OpenXml.Spreadsheet;
using DocumentFormat.OpenXml.Packaging;

    private static DataTable genericExcelTable(FileInfo fileName)
    {
        DataTable dataTable = new DataTable();
        try
        {
            using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fileName.FullName, false))
            {
                Workbook wkb = doc.WorkbookPart.Workbook;
                Sheet wks = wkb.Descendants<Sheet>().FirstOrDefault();
                SharedStringTable sst = wkb.WorkbookPart.SharedStringTablePart.SharedStringTable;
                List<SharedStringItem> allSSI = sst.Descendants<SharedStringItem>().ToList<SharedStringItem>();
                WorksheetPart wksp = (WorksheetPart)doc.WorkbookPart.GetPartById(wks.Id);

                foreach (TableDefinitionPart tdp in wksp.TableDefinitionParts)
                {
                    QueryTablePart qtp = tdp.QueryTableParts.FirstOrDefault<QueryTablePart>();
                    Table excelTable = tdp.Table;
                    int colcounter = 0;
                    foreach (TableColumn col in excelTable.TableColumns)
                    {
                        DataColumn dcol = dataTable.Columns.Add(col.Name);
                        dcol.SetOrdinal(colcounter);
                        colcounter++;
                    }

                    SheetData data = wksp.Worksheet.Elements<SheetData>().First();

                    foreach (DocumentFormat.OpenXml.Spreadsheet.Row row in data)
                    {
                        if (isInTable(row.Descendants<Cell>().FirstOrDefault(), excelTable.Reference, true))
                        {
                            int cellcount = 0;
                            DataRow dataRow = dataTable.NewRow();
                            foreach (Cell cell in row.Elements<Cell>())
                            {

                                if (cell.DataType != null && cell.DataType.InnerText == "s")
                                {
                                    dataRow[cellcount] = allSSI[int.Parse(cell.CellValue.InnerText)].InnerText;
                                }
                                else
                                {
                                    dataRow[cellcount] = cell.CellValue.Text;
                                }
                                cellcount++;
                            }
                            dataTable.Rows.Add(dataRow);
                        }
                    }
                }
            }
            //do whatever you want with the DataTable
            return dataTable;
        }
        catch (Exception ex)
        {
            //handle an error
            return dataTable;
        }
    }
    private static Tuple<int, int> returnCellReference(string cellRef)
    {
        int startIndex = cellRef.IndexOfAny("0123456789".ToCharArray());
        string column = cellRef.Substring(0, startIndex);
        int row = Int32.Parse(cellRef.Substring(startIndex));
        return new Tuple<int,int>(TextToNumber(column), row);
    }
    private static int TextToNumber(string text)
    {
        return text
            .Select(c => c - 'A' + 1)
            .Aggregate((sum, next) => sum * 26 + next);
    }
    private static bool isInTable(Cell testCell, string tableRef, bool headerRow){
        Tuple<int, int> cellRef = returnCellReference(testCell.CellReference.ToString());
        if (tableRef.Contains(":"))
        {
            int header = 0;
            if (headerRow)
            {
                header = 1;
            }
            string[] tableExtremes = tableRef.Split(':');
            Tuple<int, int> startCell = returnCellReference(tableExtremes[0]);
            Tuple<int, int> endCell = returnCellReference(tableExtremes[1]);
            if (cellRef.Item1 >= startCell.Item1
                && cellRef.Item1 <= endCell.Item1
                && cellRef.Item2 >= startCell.Item2 + header
                && cellRef.Item2 <= endCell.Item2) { return true; }
            else { return false; }
        }
        else if (cellRef.Equals(returnCellReference(tableRef)))
        {
            return true;
        } 
        else 
        {
            return false;
        }
    }