1

I have a large DataFrame with 2 million observations. For my further analysis, I intend to use a relatively smaller sample (around 15-20% of the original DataFrame) drawn from the original DataFrame. While sampling, I also intend to keep the proportion of categorical values present in one of the columns intact.

For eg: if one column has 5 categories as its values: red(20% of total observations), green(10%), blue(15%), white(25%), yellow(30%); I would like the column in the sample dataset to also show the same proportion of different categories.

Please assist!

AmanAgrwl
  • 11
  • 3

0 Answers0