I have data extracted from a pdf graph where x represents incubation times and y is the density in a csv file. I would like to calculate the percentiles, such as 95%. I'm a bit confused, should I calculate the percentile using the x values only, i.e., using np.precentile(x, 0.95)
?
Asked
Active
Viewed 2,178 times
4
-
You want percentiles for x & y values ? Or you want to annotate it in the plot ? – Kaustubh Lohani Jun 17 '20 at 07:20
-
@Zeek I want the precentile of x, but shouldn't I consider the pdf (y values) ? – sakurami Jun 17 '20 at 08:48
-
Percentiles are based on the pdf values. the 95th percentile is that x-value which has 95% of the area under the pdf to the left of it (or 5% to the right). In calculus terms, the integral from -infinity to x is 0.95. – pjs Jun 17 '20 at 13:01
-
@pjs, thank you for your answer, so how can I calculate it? when I have 2 dimensinal arrays with both pdf and its corresponding x-value. Shouldn't np.precentile(x, 0.95) give me the right value ? – sakurami Jun 17 '20 at 13:51
-
@sakurami Not being a pythonista, I have no idea. That's why I put the info relevant to your comment in a comment rather than attempt an answer. – pjs Jun 17 '20 at 14:06
-
@pjs, thank you for the info. and sorry for that – sakurami Jun 17 '20 at 14:08
-
@sakurami I saw this through the `distribution` tag. Not a problem, that's an appropriate tag and I found your question interesting. There's certainly no need to apologize! – pjs Jun 17 '20 at 14:13
-
1After poking around a bit, it looks like [numpy.trapz](https://numpy.org/doc/1.18/reference/generated/numpy.trapz.html) may be what you want. – pjs Jun 17 '20 at 16:54
-
1Suppose you have the 2-D array as `arr` and have the `x` values as the 2nd 1-D array inside. You could do something like this to get the percentile. `np.percentile(arr[1] , 0.95)`. Hope this helps ! – Kaustubh Lohani Jun 17 '20 at 17:02
-
Thank you @pjs, I did not get the idea of trapz actually and how it can be used for percentile calculation – sakurami Jun 17 '20 at 18:43
-
Thank you @Zeek, this exactly what I used for now, I have read couple of resources and they said I need to sort the observations and then calclate the presentile by multiplay n(size)* percentile and based on that I get the value of percentile of x. However, I saw also some use mean and standard deviation and I was wondering if the np.percentile() function is enough or should I use another way – sakurami Jun 17 '20 at 18:47
-
@sakurami that depends on your use case, i.e. your endgame for all of this. – Kaustubh Lohani Jun 17 '20 at 20:12
2 Answers
3
Here is some code which uses np.trapz (as proposed by @pjs). We take x and y arrays, assume it is PDF so first we normalize it to 1, an then start searching backward till we hit 0.95 point. I've made up some multi-peak function
import numpy as np
import matplotlib.pyplot as plt
N = 1000
x = np.linspace(0.0, 6.0*np.pi, N)
y = np.sin(x/2.0)/x # construct some multi-peak function
y[0] = y[1]
y = np.abs(y)
plt.plot(x, y, 'r.')
plt.show()
# normalization
norm = np.trapz(y, x)
print(norm)
y = y/norm
print(np.trapz(y, x)) # after normalization
# now compute integral cutting right limit down by one
# with each iteration, stop as soon as we hit 0.95
for k in range(0, N):
if k == 0:
xx = x
yy = y
else:
xx = x[0:-k]
yy = y[0:-k]
v = np.trapz(yy, xx)
print(f"Integral {k} from {xx[0]} to {xx[-1]} is equal to {v}")
if v <= 0.95:
break

Severin Pappadeux
- 18,636
- 3
- 38
- 64
-
Hello @Severin Pappadeux, Many thanks for code, I run the code and compare it with the result of np.percentile both gave me the same results. – sakurami Jun 22 '20 at 19:12
-
@sakurami If you have alternative version with np.percentile, please publish it as another answer, and I'll endorse it - it will definitely benefit people looking for answers. – Severin Pappadeux Jun 22 '20 at 19:20
-
@sakurami You could even pick Y as function of x from my code if you don't want to share actual data – Severin Pappadeux Jun 22 '20 at 19:27
0
I have test both @Severin Pappadeux method and np.percentile and bith gave me the same result for 95 percentile
Here code of @Severin Pappadeux but with the data I used:
import numpy as np
import matplotlib.pyplot as plt
x = [ 5. , 5.55, 6.1 , 6.65, 7.2 , 7.75, 8.3 , 8.85, 9.4 ,
9.95, 10.5 , 11.05, 11.6 , 12.15, 12.7 , 13.25, 13.8 , 14.35,
14.9 , 15.45, 16. ]
y = [0.03234577, 0.03401444, 0.03559847, 0.03719304, 0.03890566,
0.04084201, 0.04309067, 0.04570878, 0.04871024, 0.05205822,
0.05566298, 0.05938525, 0.06304516, 0.06643575, 0.06933978,
0.07154828, 0.07287886, 0.07319211, 0.0724044 , 0.0704957 ,
0.0675117 ]
N = len(x)
y[0] = y[1]
y = np.abs(y)
plt.plot(x, y, 'r.')
plt.show()
# normalization
norm = np.trapz(y, x)
print(norm)
y = y/norm
print(np.trapz(y, x)) # after normalization
# now compute integral cutting right limit down by one
# with each iteration, stop as soon as we hit 0.95
for k in range(0, N):
if k == 0:
xx = x
yy = y
else:
xx = x[0:-k]
yy = y[0:-k]
v = np.trapz(yy, xx)
print(f"Integral {k} from {xx[0]} to {xx[-1]} is equal to {v}")
if v <= 0.95:
break
# Outputs =
# 0.6057000785
# 1.0
# Integral 0 from 5.0 to 16.0 is equal to 1.0
# Integral 1 from 5.0 to 15.45 is equal to 0.9373418687777172
and when I used np.percentile() on x as @Zeek suggests:
np.percentile(x, 95)
# Output= 15.45
So, both methods gave me 15.45 as the 95 percentile of x

sakurami
- 343
- 3
- 18