Home » excel » excel – Arrays, matching, sorting

excel – Arrays, matching, sorting

Posted by: admin April 23, 2020 Leave a comment


I have two datasets, each in 3 separate columns, within the same sheet.

The 1st data set contains data on invoices, and the 2nd contains data on the individual items that are included on the invoices.

  • 1st data set (Invoices): CustomerNumber (long), InvoiceDate (date), InvoiceAmount (double)
  • 2nd data set (Items): CustomerNumber (long), ItemPurchaseDate (date), ItemCost (double)

There is no unique identifier to relate the two data sets.

General rule is that:

  • Items where CustomerNumber=X, and ItemPurchaseDate < InvoiceDate should be related to Invoice where CustomerNumber=X and InvoiceDate > ItemPurchaseDate, but InvoiceDate < nextInvoiceDate for the same CustomerNumber (if there are multiple invoices)

(Try to imagine a situation where you’re trying to match the items to nearest invoices.)

For example like this

enter image description here

So, not all items can be related to an invoice, and sometimes we will not be able to find any items for an invoice.

I wrote code, but it is inefficient as it will go through all the items for all the invoices. In my case, there are 9000 invoices and 260k items.

But to demonstrate (I’ve left out the bits that are not important):

For Each InvCustNo In InvList 'look into every Invoice
    For Each ItemCustNo In ItemList 'look into every Item
        'check if we're looking at the new client,
        ' if yes - reset the date holder
        If prevInvCustNo <> InvCustNo Then
            prevInvDate = 0
        End If
        InvDate = InvCustNo.Offset(0, 1).Value 'define the InvoiceDate
        ItemDate = ItemCustNo.Offset(0, 1).Value 'define ItemDate
        If InvCustNo = ItemCustNo And InvDate > ItemDate And _
            InvDate >= prevInvDate And _
            prevInvDate <= ItemDate Then 'perform comparison
            TempSum = TempSum + ItemCustNo.Offset(0, 2)
        End If
    Next ItemCustNo
    prevInvDate = InvDate 'update the date-1
    prevInvCustNo = InvCustNo 'update the client-1
    InvCustNo.Offset(0, 7).Value = TempSum 'print values
    TempSum = 0 'reset sum
Next InvCustNo

The above code should do the work, but not on the amount of data I have, for two main reasons:

  • it will go through 9.000 * 260.000 rows = 2.34 billion operations
  • it will write 9000 results, one at a time

Two questions came to mind

  • can I write all the results in one go? (Arrays?)
  • can I limit my matching somehow, so that I don’t do 2.34 billion operations (I guess with sorting I can)

Any thoughts & examples (which I could use as starting points) would be very helpful.

How to&Answers:

I have a non-vba solution that I think may work for you.

Given the data in the screenshot (which mirrors your example)

enter image description here

Ensure that the data in columns A:C only is sorted

  • A to Z by CustomerNumber
  • Then Newest to Oldest by InvoiceDate

Then place the following formula in G1 (and drag down):

=IF(E2>MAX(IF(A:A=D2,B:B)),"No Invoice Yet",INDEX(B:B,(MATCH(E2,IF(A:A=D2,B:B),-1))))

as an Array (Press Ctrl + Shft + Enter) instead of just Enter.

The formula works in the following way

  1. Compares the ItemDate to the Max InvoiceDate for the given customer number. If the ItemDate is newer than the latest invoice date, it returns “No Invoice Yet”
  2. Otherwise It looks for the InvoiceDate that is closest to the ItemDate (think next possible InvoiceDate after ItemDate), for the respective customer number, and returns this date.