0

My Pandas Dataframe looks like this

|id| name | condition | category | brand_name | price
-----------------------------------------------------
| 1|Shirt |   3       | Men Clot | Easy       | 250

More information about dataset: The dataset brand_name column has 150 unique brand name and category column has over 89 unique categories. The dataset has over 1000k~ rows.

I need only 10k rows from the data set. So I want to use pandas built-in function pandas.DataFrame.sample. But it sample data randomly. But I want to make sure that my 10k data points (a subset of the main dataset) contain all the unique brand name and category. Not only that I also need 10 rows with each unique brand name.

Tried Solutions: I tried pandas groupby function. But It only uses unique values. But I need not only unique values but also 10 rows with each value. I tried to search all over the StackOverflow but doesn't find my solution.

Niyamat Ullah
  • 2,384
  • 1
  • 16
  • 26

0 Answers0