Home » excel » r – How to determine unique years within date range?

# r – How to determine unique years within date range?

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’m trying to determine in what years clients make use of healthcare. The data:

``````Clientnumber   Date start  Date end
1              01-03-2017  31-10-2017
1              01-02-2018  07-08-2018
1              01-11-2018  01-03-2019
1              25-03-2019  01-07-2020
``````

For this one client I want to know in what unique years he/she is registered. Thus, the result should be:
`2017, 2018, 2019, 2020` and additonally a count of unique years: `4`.

Is there a way to do this in either Excel or R?

Thanks in advance.

How to&Answers:

In R, we can get the data in long format, convert to Date and extract year. For each client we can create a comma-separated value of `unique` `Year` and count number of distinct `Year`.

``````library(dplyr)

df %>%
tidyr::pivot_longer(cols = -Clientnumber) %>%
mutate(value = as.Date(value, "%d-%m-%Y"),
Year = format(value, "%Y")) %>%
group_by(Clientnumber) %>%
summarise(Un_year = toString(unique(Year)),
count = n_distinct(Year))

# Clientnumber  Un_year                count
#         <int> <chr>                  <int>
#1            1 2017, 2018, 2019, 2020     4
``````

### Answer：

One `dplyr` and `purrr` option could be:

``````df %>%
group_by(Clientnumber) %>%
summarise(Years = map_chr(list(c(Date_start, Date_end)),
~ toString(unique(substr(., 7, 10)))))

Clientnumber Years
<int> <chr>
1            1 2017, 2018, 2019, 2020
``````

If you want also the count, with the addition of `stringr`:

``````df %>%
group_by(Clientnumber) %>%
summarise(Years = map_chr(list(c(Date_start, Date_end)),
~ toString(unique(substr(., 7, 10)))),
n = str_count(Years, ",")+1)

Clientnumber Years                      n
<int> <chr>                  <dbl>
1            1 2017, 2018, 2019, 2020     4
``````

If the situation is slightly more complicated, meaning you want all years between the first and the last one, even if they are not present in data:

``````df %>%
group_by(Clientnumber) %>%
summarise(Years = map_chr(list(c(Date_start, Date_end)),
~ toString(reduce(range(as.numeric(substr(., 7, 10))), `:`))),
n = str_count(Years, ",")+1)
``````