I have looked around on this site and online for an answer to this question, but I have not been sucessful yet so I will ask in a new thread.
I need to be able to count the number of ocurrences of a particular text per category in Excel using a formula (or a number of formulas). Preferably this can be done without the use of VBA, but if it is nessesary I will attempt to write a VBA command.
Here is what I am working with:
Cat. Word A doo A foo A foo A loo B loo B goo B loo A roo A lol B rar C goo ... ...
I would like to have a formula that counts the number of unique occurrences of each word per category. The categories are known and finite, and are included in a list within my excel sheet, but the words can be any amount and are not known. Each category can have words that other categories would have as well the end result should look something like this:
Cat. Count of distinct ocurrences per category A 5 B 3 C 1
Finally, I know that I can deduplicate the list, but I would like to keep the list intact for archival purposes, so a count that does not require deduping would be ideal.
Here is what I tried to do, but did not work in the end:
Thank you so much for your help!
Personally, I prefer to do that using Pivot Tables. For anything larger than an occasional spreadsheet, SQL queries in Microsoft Access or other databases win the game.
Try this “array formula”
Where C1 is the first category to count for. Copy formula down for all categories
The basic idea of the formula is that the first
IF is a “filter” to ensure that you only look at B values if A value = C1, the second
IF filters out blanks in B, otherwise the
MATCH gives the position of the first match for that value in B (thus giving the same number for the same values)
The result of the
IFs is the first part of
FREQUENCY function. The “bins” in
FREQUENCY are formed by the ROW-ROW+1 part which gives you a range of numbers 1 to n which will encompass the whole range of values that
MATCH might give.
Now we just need to count the number of different numbers returned by the
IFs (as that will be equal to the number of different values in B for your criterion), so the final 1 in the formula is assigned to each non-zero value in
FREQUENCY and the result is summed.
In detail for your example where C1 = A this part
results in this array
FALSE values where col A <> “A” otherwise numbers from
MATCH giving the position of the first match for each value – notice 2 appears twice because foo appears twice, firstly in row 2
ROW(B$1:B$11)-ROW(B$1)+1 part gives you this array
so now the formula looks like this
FREQUENCY puts the values in the first array into the relevant “bins” and formula now resolves to this
Note that the array returned by
FREQUENCY sums to 6 (six rows with value “A”) but there are only 5 non-zero numbers, denoting the five different values for criterion “A”.
Now the last IF returns a 1 for each non-zero value and
SUM function sums those to get 5
I have been breaking my head on this as well. The most simple solution I found on this is a combination of a pivot table and a single formula.
- Create a pivot table
- Put Cat. in the column labels
- Put Word in the row labels
- Put the Word in the Values
Use count formula in cell
B5:B100000and substract 1 (for the totals below the pivot table:
How easy is this?
Easy way – just select Data Model option while inserting Pivot and select distinct value option from value field. You are done.