I have a data frame df
with some basic web stats ranked by Page Views (PVs):
URL PVs
1 1500
2 1200
3 900
4 700
:
100 25
I am trying to filter and count number of URLs which contribute different percentile of page views (PVs). Say, I want to know how many and which once URLs brought 90% of PVs (or 10%).
I calculated percentiles:
df.quantile(np.linspace(.1, 1, 9, 0))
And I know I can iterate through rows like this (so I can sum them up):
for index, row in df.iterrows():
print row['PVs']
But I cannot figure out how to stop when a certain threshold is reached. Will appreciate your help!