Groupby and create a new column by randomly assign multiple strings into it in Pandas

Question

Let's say I have students infos id, age and class as follows:

   id  age  class
0   1   23    a
1   2   24    a
2   3   25    b
3   4   22    b
4   5   16    c
5   6   16    d

I want to groupby class and create a new column named major by randomly assign math, art, business, science into it, which means for same class, the major strings are same.

We may need to use apply(lambda x: random.choice..) to realize this, but I don't know how to do this. Thanks for your help.

Output expected:

   id  age     major  class
0   1   23       art    a
1   2   24       art    a
2   3   25   science    b
3   4   22   science    b
4   5   16  business    c
5   6   16      math    d

Yes, sorry, I haven't found this. – ah bon May 14 '20 at 12:44 — ah bon, May 14 '20 at 12:44

jezrael · Accepted Answer · 2020-05-15T04:48:16.380

Use numpy.random.choice with number of values by length of DataFrame:

df['major'] = np.random.choice(['math', 'art', 'business', 'science'], size=len(df))
print (df)
   id  age     major
0   1   23  business
1   2   24       art
2   3   25   science
3   4   22      math
4   5   16   science
5   6   16  business

EDIT: for same major values per groups use Series.map with dictionary:

c = df['class'].unique()
vals = np.random.choice(['math', 'art', 'business', 'science'], size=len(c))

df['major'] = df['class'].map(dict(zip(c, vals)))
print (df)
   id  age class     major
0   1   23     a  business
1   2   24     a  business
2   3   25     b       art
3   4   22     b       art
4   5   16     c   science
5   6   16     d      math

Thanks, I have modified my question, could you retake a look please? — ah bon, May 14 '20 at 14:46

Groupby and create a new column by randomly assign multiple strings into it in Pandas

1 Answers1