6

Working with the Wine Review Data from Kaggle here. I am able to return the number of occurrences by variety using value_counts()

enter image description here

However, I am trying to find a quick way to limit the results to varieties and their counts where there is more than one occurrence.

Trying df.loc[df['variety'].value_counts()>1].value_counts() and df['variety'].loc[df['variety'].value_counts()>1].value_counts() both return errors.

The results can be turned into a DataFrame and the constraint added there, but something tells me that there is a way more elegant way to achieve this.

enter image description here

Michael
  • 749
  • 1
  • 8
  • 22
  • 6
    try df['variety'].value_counts().loc[lambda x : x>1] – BENY May 09 '18 at 16:53
  • @WenThat did the trick. Do you have a link to the resource for using lambda in this way? Or I should ask. Can you use a lambda expression with loc as a constraint on the results anytime you are using an aggregate function? – Michael May 09 '18 at 17:00

1 Answers1

15

@wen ansered this in the comments.

df['variety'].value_counts().loc[lambda x : x>1] 
Michael
  • 749
  • 1
  • 8
  • 22
  • 1
    This answer could benefit from a little bit of explenation. – André Kool May 09 '18 at 20:07
  • I think this provides answer to the question. Author was trying to find quick one-liner to filter out those unique data and see only data with more than 1 variety. This does the trick just nice! Thanks – addicted Sep 17 '18 at 07:10
  • How do you group this set and create a value like "others"? – sai kamal Nov 12 '22 at 14:19