I have two datasets, each in 3 separate columns, within the same sheet.
The 1st data set contains data on invoices, and the 2nd contains data on the individual items that are included on the invoices.
- 1st data set (Invoices): CustomerNumber (long), InvoiceDate (date), InvoiceAmount (double)
- 2nd data set (Items): CustomerNumber (long), ItemPurchaseDate (date), ItemCost (double)
There is no unique identifier to relate the two data sets.
General rule is that:
- Items where CustomerNumber=X, and ItemPurchaseDate < InvoiceDate should be related to Invoice where CustomerNumber=X and InvoiceDate > ItemPurchaseDate, but InvoiceDate < nextInvoiceDate for the same CustomerNumber (if there are multiple invoices)
(Try to imagine a situation where you’re trying to match the items to nearest invoices.)
For example like this
So, not all items can be related to an invoice, and sometimes we will not be able to find any items for an invoice.
I wrote code, but it is inefficient as it will go through all the items for all the invoices. In my case, there are 9000 invoices and 260k items.
But to demonstrate (I’ve left out the bits that are not important):
For Each InvCustNo In InvList 'look into every Invoice For Each ItemCustNo In ItemList 'look into every Item 'check if we're looking at the new client, ' if yes - reset the date holder If prevInvCustNo <> InvCustNo Then prevInvDate = 0 End If InvDate = InvCustNo.Offset(0, 1).Value 'define the InvoiceDate ItemDate = ItemCustNo.Offset(0, 1).Value 'define ItemDate If InvCustNo = ItemCustNo And InvDate > ItemDate And _ InvDate >= prevInvDate And _ prevInvDate <= ItemDate Then 'perform comparison TempSum = TempSum + ItemCustNo.Offset(0, 2) End If Next ItemCustNo prevInvDate = InvDate 'update the date-1 prevInvCustNo = InvCustNo 'update the client-1 InvCustNo.Offset(0, 7).Value = TempSum 'print values TempSum = 0 'reset sum Next InvCustNo
The above code should do the work, but not on the amount of data I have, for two main reasons:
- it will go through 9.000 * 260.000 rows = 2.34 billion operations
- it will write 9000 results, one at a time
Two questions came to mind
- can I write all the results in one go? (Arrays?)
- can I limit my matching somehow, so that I don’t do 2.34 billion operations (I guess with sorting I can)
Any thoughts & examples (which I could use as starting points) would be very helpful.
I have a non-vba solution that I think may work for you.
Given the data in the screenshot (which mirrors your example)
Ensure that the data in columns A:C only is sorted
- A to Z by CustomerNumber
- Then Newest to Oldest by InvoiceDate
Then place the following formula in G1 (and drag down):
=IF(E2>MAX(IF(A:A=D2,B:B)),"No Invoice Yet",INDEX(B:B,(MATCH(E2,IF(A:A=D2,B:B),-1))))
as an Array (Press Ctrl + Shft + Enter) instead of just Enter.
The formula works in the following way
- Compares the ItemDate to the Max InvoiceDate for the given customer number. If the ItemDate is newer than the latest invoice date, it returns “No Invoice Yet”
- Otherwise It looks for the InvoiceDate that is closest to the ItemDate (think next possible InvoiceDate after ItemDate), for the respective customer number, and returns this date.