0

I have this dataset:

df = pd.DataFrame({'scientist':["Wendelaar Bonga"," Sjoerd E.", "Grätzel"," Michael", "Willett", "Walter C.",
                         "Kessler", "Ronald C.", "Witten, Edward", "Wang, Zhong Lin"],
           'SubjectField': ["Biomedical Engineering", "Inorganic & Nuclear Chemistry",
                            "Organic Chemistry", "Biomedical Engineering", "Developmental Biology",
                            "Mechanical Engineering & Transports", "Biomedical Engineering", "Microbiology",
                            "Cardiovascular System & Hematology", "Biomedical Engineering"]})

and I want to count the number of scientists in each subject field and remove subject fields that have less than 2 scientists from my data.

x= df.groupby('SubjectField')['scientist'].count()
ans = x[x > 2]

this is my code but I don't know how to remove the mentioned rows:

nemo92world
  • 101
  • 8

2 Answers2

0

You are already on the right track , I have just added the code to drop the rows not satisfying the condition

import pandas as pd

df = pd.DataFrame({'scientist':["Wendelaar Bonga"," Sjoerd E.", "Grätzel"," Michael", "Willett", "Walter C.",
                         "Kessler", "Ronald C.", "Witten, Edward", "Wang, Zhong Lin"],
           'SubjectField': ["Biomedical Engineering", "Inorganic & Nuclear Chemistry",
                            "Organic Chemistry", "Biomedical Engineering", "Developmental Biology",
                            "Mechanical Engineering & Transports", "Biomedical Engineering", "Microbiology",
                            "Cardiovascular System & Hematology", "Biomedical Engineering"]})


x = df.groupby('SubjectField')['scientist'].count()

You can use drop with argument index to drop the rows not matching the condition

Tilde ~ is used as the negation to fetch the opposite of a condition

drop_idx = x[~(x > 2)].index.values
x = x.drop(index=drop_idx)

x will only contain rows with count greater than 2

Vaebhav
  • 4,672
  • 1
  • 13
  • 33
0

Try this:

mask = df.groupby('SubjectField')['SubjectField'].transform('count') > 2
filtered = df[mask]