2

I want to filter a dataframe based on values in a column. Here is how the df looks:

    lead_snp Set_1 Set_2 Set_3 Set_4 Set_5  ... Set_4995 Set_4996 Set_4997 Set_4998 Set_4999 Set_5000
0  1:2444414     8     7     1    10    17  ...       16        6       10       12        8       12
1  1:1865298     2     2    11    21     6  ...       16        3       13       17        8        3
2  1:1865298     2     2    11    21     6  ...       16        3       13       17        8        3
3  1:1865298     2     2    11    21     6  ...       16        3       13       17        8        3
4  1:1865298     2     2    11    21     6  ...       16        3       13       17        8        3

When I run (lead_chrom_only_df.groupby("lead_snp").nunique().drop("lead_snp", axis=1)), I get the error below:

KeyError: "['lead_snp'] not found in axis"

Not sure if I'm missing something obvious, thanks in advance.

gokberk
  • 47
  • 7
  • 2
    After `groupby` `"lead_snp"` is the index. `lead_chrom_only_df.groupby("lead_snp").nunique().reset_index(drop=True)` should work – Henry Ecker Nov 05 '21 at 00:12

1 Answers1

4

Try pass the as_index = False

out =  lead_chrom_only_df.groupby("lead_snp",as_index = False).nunique().drop("lead_snp", axis=1)
BENY
  • 317,841
  • 20
  • 164
  • 234