Home » excel » How to count unique values per category in Excel

How to count unique values per category in Excel

Posted by: admin March 9, 2020 Leave a comment

Questions:

I have looked around on this site and online for an answer to this question, but I have not been sucessful yet so I will ask in a new thread.

I need to be able to count the number of ocurrences of a particular text per category in Excel using a formula (or a number of formulas). Preferably this can be done without the use of VBA, but if it is nessesary I will attempt to write a VBA command.

Here is what I am working with:

Cat.    Word
A       doo
A       foo
A       foo
A       loo
B       loo
B       goo
B       loo
A       roo
A       lol
B       rar
C       goo
...      ...

I would like to have a formula that counts the number of unique occurrences of each word per category. The categories are known and finite, and are included in a list within my excel sheet, but the words can be any amount and are not known. Each category can have words that other categories would have as well the end result should look something like this:

Cat.    Count of distinct ocurrences per category
A       5
B       3
C       1

Finally, I know that I can deduplicate the list, but I would like to keep the list intact for archival purposes, so a count that does not require deduping would be ideal.

Here is what I tried to do, but did not work in the end:

=IF(A:A=C1,SUM(IF(FREQUENCY(MATCH(B1:B11,B1:B11,0),MATCH(B1:B11,B1:B11,0))>0,1)))

Thank you so much for your help!

How to&Answers:

Personally, I prefer to do that using Pivot Tables. For anything larger than an occasional spreadsheet, SQL queries in Microsoft Access or other databases win the game.

Answer:

Try this “array formula”

=SUM(IF(FREQUENCY(IF(A$1:A$11=C1,IF(B$1:B$11<>"",MATCH(B$1:B$11,B$1:B$11,0))),ROW(B$1:B$11)-ROW(B$1)+1),1))

confirmed with CTRL+SHIFT+ENTER

Where C1 is the first category to count for. Copy formula down for all categories

Explanation

The basic idea of the formula is that the first IF is a “filter” to ensure that you only look at B values if A value = C1, the second IF filters out blanks in B, otherwise the MATCH gives the position of the first match for that value in B (thus giving the same number for the same values)

The result of the IFs is the first part of FREQUENCY function. The “bins” in FREQUENCY are formed by the ROW-ROW+1 part which gives you a range of numbers 1 to n which will encompass the whole range of values that MATCH might give.

Now we just need to count the number of different numbers returned by the IFs (as that will be equal to the number of different values in B for your criterion), so the final 1 in the formula is assigned to each non-zero value in FREQUENCY and the result is summed.

In detail for your example where C1 = A this part

=IF(A$1:A$11=C1,IF(B$1:B$11<>"",MATCH(B$1:B$11,B$1:B$11,0)))

results in this array

{1;2;2;4;FALSE;FALSE;FALSE;8;9;FALSE;FALSE}

FALSE values where col A <> “A” otherwise numbers from MATCH giving the position of the first match for each value – notice 2 appears twice because foo appears twice, firstly in row 2

and the ROW(B$1:B$11)-ROW(B$1)+1 part gives you this array

{1;2;3;4;5;6;7;8;9;10;11}

so now the formula looks like this

=SUM(IF(FREQUENCY({1;2;2;4;FALSE;FALSE;FALSE;8;9;FALSE;FALSE},{1;2;3;4;5;6;7;8;9;10;11}),1))

So FREQUENCY puts the values in the first array into the relevant “bins” and formula now resolves to this

=SUM(IF({1;2;0;1;0;0;0;1;1;0;0;0},1))

Note that the array returned by FREQUENCY sums to 6 (six rows with value “A”) but there are only 5 non-zero numbers, denoting the five different values for criterion “A”.

Now the last IF returns a 1 for each non-zero value and SUM function sums those to get 5

Answer:

I have been breaking my head on this as well. The most simple solution I found on this is a combination of a pivot table and a single formula.

  1. Create a pivot table
  2. Put Cat. in the column labels
  3. Put Word in the row labels
  4. Put the Word in the Values
  5. Use count formula in cell B2 with range B5:B100000 and substract 1 (for the totals below the pivot table:

    =COUNT(B5:B100000)-1

How easy is this?

Answer:

Easy way – just select Data Model option while inserting Pivot and select distinct value option from value field. You are done.