Pandas select rows if ID appear several time

Question

I have a table like this:

CustID  Purchase  Time
A       Item1     01/01/2011
B       Item2     01/01/2011   
C       Item1     01/02/2011   
A       Item2     03/01/2011

I would like to select rows with CustID appear more than 1 in the table.

languitar · Accepted Answer · 2017-04-12T08:44:15.090

22

This could work:

counts = df['CustID'].value_counts()
df[df['CustID'].isin(counts.index[counts > 1])]

Result:

  CustID Purchase        Time
0      A    Item1  01/01/2011
3      A    Item2  03/01/2011

edited Apr 12 '17 at 08:44

answered Apr 11 '17 at 14:06

languitar

6,554
2
37
62

3

This was my approach, plus one. To make it more efficient, you can pass `sort=False` to `value_counts` – piRSquared Apr 11 '17 at 14:09
Nice answer! Your brace/parenth on the second line are backwards, though. I'd edit myself but want to avoid the risk of destroying your formatting on my phone :) – miradulo Apr 11 '17 at 19:00

score 15 · Answer 2 · edited Apr 12 '17 at 01:23

15

df[df['CustID'].duplicated(keep=False)]

This finds the rows in the data frame where there exist duplicates in the CustID column. The keep=False tells the duplicated function to mark all duplicate rows as True (as opposed to just the first or last ones):

  CustID Purchase        Time
0      A    Item1  01/01/2011
3      A    Item2  03/01/2011

EDIT

Looking at the docs for duplicated it looks like you can also do:

df[df.duplicated('CustID', keep=False)]

Though this seems to be about 100 µs slower than the original (458 µs vs. 545 µs based on the example dataframe)

edited Apr 12 '17 at 01:23

Whymarrh

13,139
14
57
108

answered Apr 11 '17 at 14:11

bunji

5,063
1
17
36

I think the fastest solution. – jezrael Apr 11 '17 at 14:25
I think this is the most intuitive approach since we deal with duplicates. +1 – pansen Apr 11 '17 at 17:11
Thank you! I have never thought that I could use 'duplicated' in this case – Hai Vu Apr 11 '17 at 20:02

score 11 · Answer 3 · edited Apr 12 '17 at 01:24

11

Use filter

df.groupby('CustID').filter(lambda x: len(x) > 1)

  CustID Purchase        Time
0      A    Item1  01/01/2011
3      A    Item2  03/01/2011

edited Apr 12 '17 at 01:24

Whymarrh

13,139
14
57
108

answered Apr 11 '17 at 14:06

piRSquared

285,575
57
475
624

Pandas select rows if ID appear several time

3 Answers3