With Pandas in Python, select only the rows where group by group count is 1

Question

I've filtered my data as suggested here: With Pandas in Python, select the highest value row for each group

    author        cat  val
0  author1  category2   15
1  author2  category4    9
2  author3  category1    7
3  author3  category3    7

Now, I want to only get the authors present in this data frame once. I wrote this, but it doesn't work:

def where_just_one_exists(group):
        return group.loc[group.count() == 1]
most_expensive_single_category = most_expensive_for_each_model.groupby('author', as_index = False).apply(where_just_one_exists).reset_index(drop = True)
print most_expensive_single_category

Error:

  File "/home/mike/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1659, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series key provided')
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided

My desired output is:

    author        cat  val
0  author1  category2   15
1  author2  category4    9
2  author3  category1    7
3  author3  category3    7

http://stackoverflow.com/questions/18851216/pandas-drop-all-records-of-duplicate-indices — Padraic Cunningham, Jul 12 '15 at 13:33
I've added the desired output. Padraic's solution seems to be exactly what someone has suggested below. — Mike, Jul 12 '15 at 14:20

score 7 · Accepted Answer · answered Jul 12 '15 at 12:25

7

Easier

df.groupby('author').filter(lambda x: len(x)==1)


     author        cat  val
id                         
0   author1  category2   15
1   author2  category4    9

answered Jul 12 '15 at 12:25

Gecko

1,379
11
14

How would I sort it after applying count and mean? http://stackoverflow.com/questions/31368918/with-pandas-in-python-how-do-i-sort-by-two-columns-which-are-created-by-the-agg – Mike Jul 12 '15 at 15:03

score 2 · Answer 2 · answered Jul 12 '15 at 13:24

my solution is a bit more complex but still working

def groupbyOneOccurrence(df):
    grouped = df.groupby("author")
    retDf = pd.DataFrame()
    for group in grouped:
        if len(group[1]._get_values) == 1:
            retDf = pd.concat([retDf, group[1]])
    return retDf


author        cat val
0  author1  category2  15
1  author2  category4   9

With Pandas in Python, select only the rows where group by group count is 1

2 Answers2

Linked