I was wondering how would I convert the Excel’s Percentile rank exclusive function in R. I found a technique here which is like this:
true_df <- data.frame(some_column= c(24516,7174,13594,33838,40000))
percentilerank<-function(x){
rx<-rle(sort(x))
smaller<-cumsum(c(0, rx$lengths))[seq(length(rx$lengths))]
larger<-rev(cumsum(c(0, rev(rx$lengths))))[-1]
rxpr<-smaller/(smaller+larger)
rxpr[match(x, rx$values)]
}
dfr<-percentilerank(true_df$some_column)
#output which is similar to =PERCENTRANK.INC and NOT =PERCENTRANK.EXC
#[1] 0.50 0.00 0.25 0.75 1.00
But it is for =PERCENTRANK.INC
equivalent in R. According to info popup in Excel, a =PERCENTRANK.INC
takes (array, x-value of rank, [significance-optional]) and returns percentage rank inclusive of the first (0%) and last (100%) values in the array.
=PERCENTRANK.EXC
is similar to its counterpart but it returns percentage rank exclusive of the first and last values in the array. Meaning not 0% or 100%.
Here is a small example using Excel to show difference:
When I apply the above R function it gives me the output similar to PERCENTRANK.INC($A$32:$A$36,A32)
column. How can I achieve this? I’m new to R.
Using dplyr:
library(dplyr)
# inclusive
percent_rank(x)
# exclusive
percent_rank(c(-Inf, Inf, x))[-(1:2)]
Answer:
I messed around with the code and got this:
true_df <- data.frame(some_column= c(24516,7174,13594,33838,40000))
percentilerank<-function(x){
rx<-rle(sort(x))
smaller<-cumsum(c(!0, rx$lengths))[seq(length(rx$lengths))]
larger<-rev(cumsum(c(0, rev(rx$lengths))))
rxpr<-smaller/(smaller+larger)
rxpr[match(x, rx$values)]
}
dfr<-percentilerank(true_df$some_column)
#output is now matches =PERCENTRANK.EXC
#[1] 0.5000000 0.1666667 0.3333333 0.6666667 0.8333333
Since the 0 and 100% are not included in the percentile. I changed the line smaller<-cumsum(c(0....
to smaller<-cumsum(c(!0....
and similarly to get rid of 100% where I took out [-1] from line larger<-...[-1]