I have a dataframe with nomalized percentage info. Eg.
wordCount number Percent
2.0 1282 0.267345
1.0 888 0.185213
3.0 1124 0.170791
4.0 1250 0.152877
5.0 554 0.084864
6.0 333 0.058904
7.0 160 0.024290
8.0 111 0.016851
All percentage can be sum up to 1. The dataframe is 6000 entries. I wish to take 2000 sample from it. The 2000 sample shall be as balance as possible.
It shall include maximum the small amount of percentage data and minimun the large amount of percentage data.
I dont know how to do it.
Eg. 2000 has all data from wordCount 8.0 and have minimum data from 2.0.
When I plot the gamma distribution, the line shall be as flat as possible.