Home » excel » How to extract data from an Excel file with C# / FileHelpers ExcelNPOIStorage

How to extract data from an Excel file with C# / FileHelpers ExcelNPOIStorage

Posted by: admin April 23, 2020 Leave a comment

Questions:

I am trying to extract data from an Excel sheet with Filehelpers ExcelNPOIStorage. Therefore I created a class :

public static class UalExcelReader
    {
        public static UalShipmentRecord[] ReadInput(String pathToFile)
        {
            var provider = new ExcelNPOIStorage(typeof (UalShipmentRecord))
            {
                StartRow = 2,
                StartColumn = 1,
                FileName = pathToFile
            };
            var res = (UalShipmentRecord[]) provider.ExtractRecords();
            return res;
        }
    }

and of course the model class:

[DelimitedRecord("|")]
public class UalShipmentRecord
{
    public string contentofcol1;
    public string contentofcol2;
    ...

}

But I am getting an IndexOutOfRangeException calling ExtractRecords():

System.IndexOutOfRangeException was unhandled
  HResult=-2146233080
  Message=Index was outside the bounds of the array.
  Source=FileHelpers
  StackTrace:
       at FileHelpers.RecordOperations.ValuesToRecord(Object[] values)
       at FileHelpers.DataLink.DataStorage.ValuesToRecord(Object[] values)
       at FileHelpers.ExcelNPOIStorage.ExcelNPOIStorage.ExtractRecords()
       at Test.Controller.UalExcelReader.ReadInput(String pathToFile) in c:\TEMP\test\Test\Test\Test\Controller\UalExcelReader.cs:line 17
       at Test.App.OnStartup(StartupEventArgs eventArgs) in c:\TEMP\test\Test\Test\Test\App.xaml.cs:line 23
       at System.Windows.Application.<.ctor>b__1(Object unused)
       at System.Windows.Threading.ExceptionWrapper.InternalRealCall(Delegate callback, Object args, Int32 numArgs)
       at MS.Internal.Threading.ExceptionFilterHelper.TryCatchWhen(Object source, Delegate method, Object args, Int32 numArgs, Delegate catchHandler)
       at System.Windows.Threading.DispatcherOperation.InvokeImpl()
       at System.Windows.Threading.DispatcherOperation.InvokeInSecurityContext(Object state)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Windows.Threading.DispatcherOperation.Invoke()
       at System.Windows.Threading.Dispatcher.ProcessQueue()
       at System.Windows.Threading.Dispatcher.WndProcHook(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam, Boolean& handled)
       at MS.Win32.HwndWrapper.WndProc(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam, Boolean& handled)
       at MS.Win32.HwndSubclass.DispatcherCallbackOperation(Object o)
       at System.Windows.Threading.ExceptionWrapper.InternalRealCall(Delegate callback, Object args, Int32 numArgs)
       at MS.Internal.Threading.ExceptionFilterHelper.TryCatchWhen(Object source, Delegate method, Object args, Int32 numArgs, Delegate catchHandler)
       at System.Windows.Threading.Dispatcher.LegacyInvokeImpl(DispatcherPriority priority, TimeSpan timeout, Delegate method, Object args, Int32 numArgs)
       at MS.Win32.HwndSubclass.SubclassWndProc(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam)
       at MS.Win32.UnsafeNativeMethods.DispatchMessage(MSG& msg)
       at System.Windows.Threading.Dispatcher.PushFrameImpl(DispatcherFrame frame)
       at System.Windows.Threading.Dispatcher.PushFrame(DispatcherFrame frame)
       at System.Windows.Threading.Dispatcher.Run()
       at System.Windows.Application.RunDispatcher(Object ignore)
       at System.Windows.Application.RunInternal(Window window)
       at System.Windows.Application.Run(Window window)
       at System.Windows.Application.Run()
       at Test.App.Main() in c:\TEMP\test\Test\Test\Test\obj\Debug\App.g.cs:line 0
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException:

Am I using it correctly? Is there an example that I could look at?

How to&Answers:

1) This error can occur when you have blank cells in your excel sheet. This appears to be an underlying problem in the way ExcelNPOIStorage retrieves the values for a given row.

NPOI is using NPOI.CellWalk to traverse the cells in each row, which seemingly skips blank cells. As a result the Values array is shorter than expected by the number of blank cells.

It looks like a different approach is required, as mentioned here: http://poi.apache.org/spreadsheet/quick-guide.html#Iterator

2) Something which can cause blank cells when none exist is having incorrect StartRow and StartColumn values.

Contrary to the intellisense comments for StartRow and StartColumn, for ExcelNPOIStorage they are 0-based (whereas in ExcelStorage they are 1-based)

Source: https://github.com/MarcosMeli/FileHelpers/blob/master/FileHelpers.ExcelNPOIStorage/Test/Program.cs

Given the issues I have encountered, I would second the comment above and use the old ExcelStorage class which is more reliable, with the downside that it is dependent on Excel Interop.

Answer:

The MissingCellPolicy in NPOI library is set to MissingCellPolicy.RETURN_NULL_AND_BLANK by default on both the HSSFWorkbook and XSSFWorkbook classes.

Changing the value to MissingCellPolicy.CREATE_NULL_AS_BLANK on the relevant Workbook instance resolves the issue.

FileHelpers.ExcelNPOIStorage
ExcelNPOIStorage.cs (OpenWorkbook method)
Fix on Line 101

  private void OpenWorkbook(string filename)
    {
        FileInfo info = new FileInfo(filename);
        if (info.Exists == false)
            throw new FileNotFoundException(string.Concat("Excel File '", filename, "' not found."), filename);

        using (FileStream file = new FileStream(filename, FileMode.Open, FileAccess.Read)) {
            var extension = Path.GetExtension(filename);
            if (extension.ToLowerInvariant() == ".xlsx" || extension.ToLowerInvariant() == ".xlsm")
            {
                mWorkbook = new XSSFWorkbook(file);
            }
            else
                mWorkbook = new HSSFWorkbook(file);
            // Next line fix
            Line 101 - mWorkbook.MissingCellPolicy = MissingCellPolicy.CREATE_NULL_AS_BLANK;
            // Previous line fix  
            if (String.IsNullOrEmpty(SheetName))
                mSheet = mWorkbook.GetSheetAt(mWorkbook.ActiveSheetIndex);  
            else {
                try {
                    mSheet = mWorkbook.GetSheet(SheetName);
                    if (mSheet == null) {
                        throw new ExcelBadUsageException(string.Concat("The sheet '",
                            SheetName,
                            "' was not found in the workbook."));
                    }

                    var sheetIndex = mWorkbook.GetSheetIndex(mSheet);
                    mWorkbook.SetActiveSheet(sheetIndex);
                }
                catch {
                    throw new ExcelBadUsageException(string.Concat("The sheet '",
                        SheetName,
                        "' was not found in the workbook."));
                }
            }
        }
    }

Below the url to the fix on the github repo

https://github.com/MarcosMeli/FileHelpers/pull/291

Answer:

I think empty cells are a problem. In my file, I just made the edge in the file around the data and it helped me. Try to mark the data with the mouse and turn on the edges around the fields.