I want to ignore the rows is the occupation has less than 2 unique names:
name value occupation
a 23 mechanic
a 24 mechanic
b 30 mechanic
c 40 mechanic
c 41 mechanic
d 30 doctor
d 20 doctor
e 70 plumber
e 71 plumber
f 30 plumber
g 50 tailor
I did:
df.groupby('ocuupation')['name'].nunique()
>>>>>>
occupation
mechanic 3
doctor 1
plumber 2
tailor 1
Name: name, dtype: int64
Is it possible to use something like df = df.drop(df[<some boolean condition>].index)
?
Desired output:
name value occupation
a 23 mechanic
a 24 mechanic
b 30 mechanic
c 40 mechanic
c 41 mechanic
e 70 plumber
e 71 plumber
f 30 plumber