Home » Python » How do I calculate percentiles with python/numpy?

How do I calculate percentiles with python/numpy?

Questions:

Is there a convenient way to calculate percentiles for a sequence or single-dimensional numpy array?

I am looking for something similar to Excel’s percentile function.

I looked in NumPy’s statistics reference, and couldn’t find this. All I could find is the median (50th percentile), but not something more specific.

You might be interested in the SciPy Stats package. It has the percentile function you’re after and many other statistical goodies.

percentile() is available in numpy too.

import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50) # return 50th percentile, e.g median.
print p
3.0

This ticket leads me to believe they won’t be integrating percentile() into numpy anytime soon.

Questions:

By the way, there is a pure-Python implementation of percentile function, in case one doesn’t want to depend on scipy. The function is copied below:

## {{{ http://code.activestate.com/recipes/511478/ (r1)
import math
import functools

def percentile(N, percent, key=lambda x:x):
"""
Find the percentile of a list of values.

@parameter N - is a list of values. Note N MUST BE already sorted.
@parameter percent - a float value from 0.0 to 1.0.
@parameter key - optional key function to compute value from each element of N.

@return - the percentile of the values
"""
if not N:
return None
k = (len(N)-1) * percent
f = math.floor(k)
c = math.ceil(k)
if f == c:
return key(N[int(k)])
d0 = key(N[int(f)]) * (c-k)
d1 = key(N[int(c)]) * (k-f)
return d0+d1

# median is 50th percentile.
median = functools.partial(percentile, percent=0.5)
## end of http://code.activestate.com/recipes/511478/ }}}

Questions:
import numpy as np
a = [154, 400, 1124, 82, 94, 108]
print np.percentile(a,95) # gives the 95th percentile

Questions:

The definition of percentile I usually see expects as a result the value from the supplied list below which P percent of values are found… which means the result must be from the set, not an interpolation between set elements. To get that, you can use a simpler function.

def percentile(N, P):
"""
Find the percentile of a list of values

@parameter N - A list of values.  N must be sorted.
@parameter P - A float value from 0.0 to 1.0

@return - The percentile of the values.
"""
n = int(round(P * len(N) + 0.5))
return N[n-1]

# A = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# B = (15, 20, 35, 40, 50)
#
# print percentile(A, P=0.3)
# 4
# print percentile(A, P=0.8)
# 9
# print percentile(B, P=0.3)
# 20
# print percentile(B, P=0.8)
# 50

If you would rather get the value from the supplied list at or below which P percent of values are found, then use this simple modification:

def percentile(N, P):
n = int(round(P * len(N) + 0.5))
if n > 1:
return N[n-2]
else:
return N

Or with the simplification suggested by @ijustlovemath:

def percentile(N, P):
n = max(int(round(P * len(N) + 0.5)), 2)
return N[n-2]

Questions:

check for scipy.stats module:

scipy.stats.scoreatpercentile

Questions:

Here’s how to do it without numpy, using only python to calculate the percentile.

import math

def percentile(data, percentile):
size = len(data)
return sorted(data)[int(math.ceil((size * percentile) / 100)) - 1]

p5 = percentile(mylist, 5)
p25 = percentile(mylist, 25)
p50 = percentile(mylist, 50)
p75 = percentile(mylist, 75)
p95 = percentile(mylist, 95)

Questions:

To calculate the percentile of a series, run:

from scipy.stats import rankdata
import numpy as np

def calc_percentile(a, method='min'):
if isinstance(a, list):
a = np.asarray(a)
return rankdata(a, method=method) / float(len(a))

For example:

a = range(20)
print {val: round(percentile, 3) for val, percentile in zip(a, calc_percentile(a))}
>>> {0: 0.05, 1: 0.1, 2: 0.15, 3: 0.2, 4: 0.25, 5: 0.3, 6: 0.35, 7: 0.4, 8: 0.45, 9: 0.5, 10: 0.55, 11: 0.6, 12: 0.65, 13: 0.7, 14: 0.75, 15: 0.8, 16: 0.85, 17: 0.9, 18: 0.95, 19: 1.0}