Home » excel » Finding uninterrupted sub-arrays in Excel – Kadane's algorithm variation?

Finding uninterrupted sub-arrays in Excel – Kadane's algorithm variation?

Posted by: admin May 14, 2020 Leave a comment

Questions:

Suppose you have an ordered, indexed list of positive values. These positive values are interrupted by 0 values. I want to determine if a consecutive sub-array exists which is not interrupted by 0 values and whose sum exceeds a certain threshold.

Simple example:

Index, Value
0   0
1   0
2   3
3   4
4   2
5   6
6   0
7   0
8   0
9   2
10  3
11  0

In the above example, the largest consecutive sub-array not interrupted by 0 is from index 2 to index 5 inclusive, and the sum of this sub-array is 15.

Thus, for the following thresholds 20, 10 and 4, the results should be FALSE, TRUE and TRUE respectively.

Note I don’t necessarily have to find the largest sub-array, I only have to know if any uninterrupted sub-array sum exceeds the defined threshold.

I suspect this problem is a variation of Kadane’s algorithm, but I can’t quite figure out how to adjust it.

The added complication is that I have to perform this analysis in Excel or Google Sheets, and I cannot use scripts to do it – only inbuilt formulas.

I’m not sure if this can even be done, but I would be grateful for any input.

How to&Answers:

Start with

=B2

in c2

then put

=IF(B3=0,0,B3+C2)

in C3 and copy down.

enter image description here

EDIT 1

If you were looking for a Google sheets solution, try something like this:

=ArrayFormula(max(sumif(A2:A,"<="&A2:A,B2:B)-vlookup(A2:A,{if(B2:B=0,A2:A),sumif(A2:A,"<="&A2:A,B2:B)},2)))

Assumes that numbers in column B start with zero: would need to add Iferror if not. It’s basically an array formula implementation of @Gary’s student’s method.

EDIT 2

Here is the Google Sheets formula translated back into Excel. It gives you an alternative if you don’t want to use Offset:

=MAX(SUMIF(A2:A13,"<="&A2:A13,B2:B13)-INDEX(SUMIF(A2:A13,"<="&A2:A13,B2:B13),N(IF({1},MATCH(A2:A13,IF(B2:B13=0,A2:A13)))))) 

(entered as an array formula).

Comment

Maybe the real challenge is to find a formula that works both in Excel and Google sheets because:

  • Vlookup doesn’t work the same way in Excel
  • The offset/subtotal combination doesn’t work in Google sheets
  • The index/match combination with n(if{1}… doesn’t work in Google sheets.

Answer:

With data in columns A and B, insure column B end with a 0. Then in C2 enter:

=IF(AND(B3=0,B2<>0),SUM(B$1:$B2)-MAX($C$1:C1),””)

and copy downwards:

enter image description here

Column C lists the sums of consecutive non-zeros. In another cell enter something like:

=MAX(C:C)>19

where 19 is the criteria value.

You can avoid the “helper” column by using a VBA UDF.

EDIT#1:

Use this instead:

=IF(AND(B3=0,B2<>0),SUM(B$1:$B2)-SUM($C$1:C1),"")

Answer:

Thanks to @Tom Sharpe and @Gary’s Student for answering the question.

While I admittedly did not specify this in the question, I would prefer to achieve the solution without a helper column because I have to do this operation on 30+ successive columns. I just didn’t think it was possible in Excel.

Full credit goes to user XOR LX on the Excelforum for coming up with this solution. It has blown my mind and took me the better part of an hour to wrap my head around, but it is certainly very creative. There is no way I could have come up with it myself. Re-posting it here for the benefit of everyone who is looking into this.

Copy and paste the table from my initial question into an empty Excel sheet such that the headers appear in (A1:B1) and the values appear in (A2:B13).

Then enter this formula as an array formula (ctrl+shift+enter), which gives the max of the sums of all the uninterrupted sub-arrays:

=MAX(SUBTOTAL(9,OFFSET(B2,A2:A14,,-FREQUENCY(IF(B2:B13,A2:A13),IF(B3:B14=0,A2:A13,0))-1)))

Note the deliberate offset to include one additional row below the end of the dataset.