I have this dataset:
text sentiment
randomstring positive
randomstring negative
randomstring netrual
random mixed
Then if I run a countmap
i have:
"mixed" -> 600
"positive" -> 2000
"negative" -> 3300
"netrual" -> 780
I want to random sample from this dataset in a way that I have records of all smallest class (mixed = 600)
and the same amount of each of other classes (positive=600, negative=600, neutral = 600)
I know how to do this in pandas:
df_teste = [data.loc[data.sentiment==i]\
.sample(n=int(data['sentiment']
.value_counts().nsmallest(1)[0]),random_state=SEED) for i in data.sentiment.unique()]
df_teste = pd.concat(df_teste, axis=0, ignore_index=True)
But I am having a hard time to do this in Julia.
Note: I don´t want to hardcode which of the class is the lowest one, so I am looking for a solution that infer that from the countmap
or freqtable
, if possible.