How to make sure certain items is added when data sampling in pandas

Asked Jan 14 '18 at 05:45

Active Jan 14 '18 at 05:45

Viewed 32 times

My Pandas Dataframe looks like this

|id| name | condition | category | brand_name | price
-----------------------------------------------------
| 1|Shirt |   3       | Men Clot | Easy       | 250

More information about dataset: The dataset brand_name column has 150 unique brand name and category column has over 89 unique categories. The dataset has over 1000k~ rows.

I need only 10k rows from the data set. So I want to use pandas built-in function pandas.DataFrame.sample. But it sample data randomly. But I want to make sure that my 10k data points (a subset of the main dataset) contain all the unique brand name and category. Not only that I also need 10 rows with each unique brand name.

Tried Solutions: I tried pandas groupby function. But It only uses unique values. But I need not only unique values but also 10 rows with each value. I tried to search all over the StackOverflow but doesn't find my solution.

asked Jan 14 '18 at 05:45

Niyamat Ullah

2,384
1
16
26

https://stackoverflow.com/questions/36390406/pandas-sample-each-group-after-groupby - in groupby mention brand and category `groupby(['brand','category'])` then apply sample – Bharath M Shetty Jan 14 '18 at 05:58
I see, misunderstood the question. – cs95 Jan 14 '18 at 06:04

How to make sure certain items is added when data sampling in pandas

0 Answers0