Home » excel » how to calculate 95th percentile of values with grouping variable in R or Excel

# how to calculate 95th percentile of values with grouping variable in R or Excel

Questions:

i’m trying to calculate the 95th percentile for multiple water quality values grouped by watershed. for example…

``````Watershed   WQ
50500101    62.370661
50500101    65.505046
50500101    58.741477
50500105    71.220034
50500105    57.917249
``````

i reviewed this question posted – Percentile for Each Observation w/r/t Grouping Variable. it seems very close to what i want to do but it’s for EACH observation. i need it for each grouping variable. so ideally,

``````Watershed   WQ - 95th
50500101    x
50500105    y
``````

thanks

This can be achieved using the `plyr` library. We specify the grouping variable `Watershed` and ask for the 95% quantile of WQ.

``````library(plyr)
#Random seed
set.seed(42)
#Sample data
dat <- data.frame(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
#plyr call
ddply(dat, "Watershed", summarise, WQ95 = quantile(WQ, .95))
``````

and the results

``````  Watershed     WQ95
1         a 1.353993
2         b 1.461711
``````

I hope I understand your question correctly. Is this what you’re looking for?

``````my.df <- data.frame(group = gl(3, 5), var = runif(15))
aggregate(my.df\$var, by = list(my.df\$group), FUN = function(x) quantile(x, probs = 0.95))

Group.1         x
1       1 0.6913747
2       2 0.8067847
3       3 0.9643744
``````

EDIT

``````aggregate(my.df\$var, by = list(my.df\$group), FUN = quantile, probs  = 0.95)
``````

also works (you can skin a cat 1001 ways – I’ve been told). A side note, you can specify a vector of desired -iles, say `c(0.1, 0.2, 0.3...)` for deciles. Or you can try function `summary` for some predefined statistics.

``````aggregate(my.df\$var, by = list(my.df\$group), FUN = summary)
``````

Use a combination of the tapply and quantile functions. For example, if your dataset looks like this:

``````DF <- data.frame('watershed'=sample(c('a','b','c','d'), 1000, replace=T), wq=rnorm(1000))
``````

Use this:

``````with(DF, tapply(wq, watershed, quantile, probs=0.95))
``````

In Excel, you’re going to want to use an array formula to make this easy. I suggest the following:

``````{=PERCENTILE(IF(\$A2:\$A6 = Watershed ID, \$B\$2:\$B\$6), 0.95)}
``````

Column A would be the Watershed ids, and Column B would be the WQ values.

Also, be sure to enter the formula as an array formula. Do so by pressing Ctrl+Shift+Enter when entering the formula.

``````set.seed(42)