1

Let's say I have students infos id, age and class as follows:

   id  age  class
0   1   23    a
1   2   24    a
2   3   25    b
3   4   22    b
4   5   16    c
5   6   16    d

I want to groupby class and create a new column named major by randomly assign math, art, business, science into it, which means for same class, the major strings are same.

We may need to use apply(lambda x: random.choice..) to realize this, but I don't know how to do this. Thanks for your help.

Output expected:

   id  age     major  class
0   1   23       art    a
1   2   24       art    a
2   3   25   science    b
3   4   22   science    b
4   5   16  business    c
5   6   16      math    d
ah bon
  • 9,293
  • 12
  • 65
  • 148

1 Answers1

2

Use numpy.random.choice with number of values by length of DataFrame:

df['major'] = np.random.choice(['math', 'art', 'business', 'science'], size=len(df))
print (df)
   id  age     major
0   1   23  business
1   2   24       art
2   3   25   science
3   4   22      math
4   5   16   science
5   6   16  business

EDIT: for same major values per groups use Series.map with dictionary:

c = df['class'].unique()
vals = np.random.choice(['math', 'art', 'business', 'science'], size=len(c))

df['major'] = df['class'].map(dict(zip(c, vals)))
print (df)
   id  age class     major
0   1   23     a  business
1   2   24     a  business
2   3   25     b       art
3   4   22     b       art
4   5   16     c   science
5   6   16     d      math
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252