Home » excel » c# – Filtering of blank rows in LinqToExcel before parsing (transformation / mapping)

c# – Filtering of blank rows in LinqToExcel before parsing (transformation / mapping)

Posted by: admin May 14, 2020 Leave a comment

Questions:

I am using LinqToExcel to map Excel rows to objects in a C# / .NET project.

I put validation code in my transformation functions so that they do not only transform the data but also warn the user when there is some data missing.
Example:

excel.AddTransformation<PaymentObject>(x => x.PaymentPeriod, cellvalue =>
{
    if (cellvalue.Length == 0)
    {
        throw new Exception(String.Format(Errors.EmptyField, ColumnNames.PaymentPeriod, ColumnNames.EmployeeNumber, lastCheckedEmployeeNumber));
    }

    return CultureInfo.InvariantCulture.TextInfo.ToTitleCase(cellvalue);
});

However, I do NOT want that this validation is triggered by the empty rows which Excel sometimes adds at the bottom (see LinqToExcel blank rows).

My problem is that I can’t use the solution mentioned there because I can’t access the raw row data when calling something like

excel.Worksheet<SomeType>("WorksheetName").Where(row => row.Any(cell => cell != null));

This is because first the transformations are applied and the Where-method will be applied on the transformation results.

Also – in the transformation functions I have no access to other values in the row, so I can’t check whether it is a single empty cell (mistake) or the row is completely empty.

Is it possible to filter out the empty rows BEFORE applying the transformations?

How to&Answers:

You can join the strongly-typed worksheet with an untyped worksheet, then use the untyped worksheet to find the totaly blank rows:

List<T> onlyNonBlankRows = _queryFactory.Worksheet<T>(firstWorksheetWithColumnHeaders)
    // calling ToList here is workaround to avoid Remotion.Data.Linq.Parsing.ParserException for Select((item,index) => ...) - "This overload of the method 'System.Linq.Queryable.Select' is currently not supported, but you can register your own parser if needed."
    .ToList()
    .Select((typedRow, index) => new { typedRow, index })
    // Join the worksheet to an untyped projection of the same worksheet so that we can find totally blank rows
    .Join(
        _queryFactory.Worksheet(firstWorksheetWithColumnHeaders)
    // calling ToList here is workaround to avoid Remotion.Data.Linq.Parsing.ParserException for Select((item,index) => ...)
                    .ToList()
                    .Select(
                        (untypedRow, indexForUntypedRow) =>
                        new { untypedRow, indexForUntypedRow }),
    // join on row index - row 1 matches row 1 etc
        arg => arg.index, arg => arg.indexForUntypedRow,
        (a, b) => new { a.index, a.typedRow, b.untypedRow })
    // Exclude rows where all cells are empty 
    .Where(x => x.untypedRow.Any(cell => cell.Value != DBNull.Value))
    .Select(joined => joined.typedRow).ToList();

Answer:

Is there some cell that is blank only when the whole row is blank?

For example, there’s usually an Id column that is always populated except for blank rows. If that is the case, then the following query should work for you.

//assuming Id cell is only blank when the whole row is blank
excel.WorkSheet<PaymentObject>().Where(x => x.Id != "");

//the Id cell might be null instead of blank, so use this Where clause instead
excel.WorkSheet<PaymentObject>().Where(x => x.Id != null);