My Pandas Dataframe looks like this
|id| name | condition | category | brand_name | price
-----------------------------------------------------
| 1|Shirt | 3 | Men Clot | Easy | 250
More information about dataset: The dataset
brand_name
column has 150 unique brand name andcategory
column has over 89 unique categories. The dataset has over 1000k~ rows.
I need only 10k rows from the data set. So I want to use pandas built-in function pandas.DataFrame.sample
. But it sample data randomly. But I want to make sure that my 10k data points (a subset of the main dataset) contain all the unique brand name and category. Not only that I also need 10 rows with each unique brand name.
Tried Solutions: I tried pandas
groupby
function. But It only uses unique values. But I need not only unique values but also 10 rows with each value. I tried to search all over the StackOverflow but doesn't find my solution.