remove some rows from data after doing groupy

Question

I have this dataset:

df = pd.DataFrame({'scientist':["Wendelaar Bonga"," Sjoerd E.", "Grätzel"," Michael", "Willett", "Walter C.",
                         "Kessler", "Ronald C.", "Witten, Edward", "Wang, Zhong Lin"],
           'SubjectField': ["Biomedical Engineering", "Inorganic & Nuclear Chemistry",
                            "Organic Chemistry", "Biomedical Engineering", "Developmental Biology",
                            "Mechanical Engineering & Transports", "Biomedical Engineering", "Microbiology",
                            "Cardiovascular System & Hematology", "Biomedical Engineering"]})

and I want to count the number of scientists in each subject field and remove subject fields that have less than 2 scientists from my data.

x= df.groupby('SubjectField')['scientist'].count()
ans = x[x > 2]

this is my code but I don't know how to remove the mentioned rows:

based out of your code `ans` will only the have rows with `count > 2` , what else do u need? — Vaebhav, Dec 31 '20 at 07:08

score 0 · Answer 1 · answered Dec 31 '20 at 07:16

You are already on the right track , I have just added the code to drop the rows not satisfying the condition

import pandas as pd

df = pd.DataFrame({'scientist':["Wendelaar Bonga"," Sjoerd E.", "Grätzel"," Michael", "Willett", "Walter C.",
                         "Kessler", "Ronald C.", "Witten, Edward", "Wang, Zhong Lin"],
           'SubjectField': ["Biomedical Engineering", "Inorganic & Nuclear Chemistry",
                            "Organic Chemistry", "Biomedical Engineering", "Developmental Biology",
                            "Mechanical Engineering & Transports", "Biomedical Engineering", "Microbiology",
                            "Cardiovascular System & Hematology", "Biomedical Engineering"]})


x = df.groupby('SubjectField')['scientist'].count()

You can use drop with argument index to drop the rows not matching the condition

Tilde ~ is used as the negation to fetch the opposite of a condition

drop_idx = x[~(x > 2)].index.values
x = x.drop(index=drop_idx)

x will only contain rows with count greater than 2

score 0 · Answer 2 · edited Dec 31 '20 at 07:37

0

Try this:

mask = df.groupby('SubjectField')['SubjectField'].transform('count') > 2
filtered = df[mask]

edited Dec 31 '20 at 07:37

Sabito stands with Ukraine

4,271
8
34
56

answered Dec 31 '20 at 07:31

Topilko Tomek

46
3

remove some rows from data after doing groupy

2 Answers2