Home » excel » excel – Separating values that are combined in one string

excel – Separating values that are combined in one string

Posted by: admin May 14, 2020 Leave a comment


I would like to solve this either in Excel or in SPSS:

I have categorical data (each number representing a medical diagnosis) that are combined into single cells. In other words, a row (patient) has multiple diagnoses. However, I would like to know the frequencies of each diagnosis. What is the best way to go about this? (See picture for reference)

enter image description here

How to&Answers:


First just creating some sample data to demonstrate on:

data list free/e_cerv_dis_state (a20).
begin data
"{1/2/3/6}" "{1/2/4}" "{2/4/5}" "{1/5/6}" "{4}" "{4/5/6}" "{1/2/3/4/5/6}"
end data.

Now the following code will create a separate variable for each possible diagnosis, and will put a 1 in it if the diagnosis exists in the original variable.

do repeat vr=diag1 to diag9/vl=1 to 9.
    compute vr=char.index(e_cerv_dis_state, string(vl, f1) ) > 0.
end repeat.
freq diag1 to diag6.

Note this will only work for up to 9 diagnoses. If you have more than that the solution will have to be adapted to multiple digits.


Assuming that the number of columns is fairly regular, I would suggest using text to columns, and then using COUNTIF on the cells if they are the value wanted. However there is a more robust and reproducible solution that would involve using SQL. If you download the free version of SQL Express here: https://www.microsoft.com/en-gb/sql-server/sql-server-downloads

Then you can import your table of data, here’s how to do that: How to import an Excel file into SQL Server?

Then you could use the more friendly SQL database to get the answers you want. For example you can use a select statement that would say:

SELECT count(e_cerv_dis_state)
WHERE e_cerv_dis_state = '6'

It would also be possible to use a CASE WHEN statement to add-in the names of the diagnoses.