1

I have a dataframe where I want to extract values from two columns but the criteria set is unique values from one of the columns. In the image below, I want to extract unique values of 'education' along with its corresponding values from 'education-num'. I can easily extract the unique values with df['education'].unique() and I am stuck with not being able to extract the 'education-num'.

image of the dataframe.

(Originally the task was to compute the population of people with education of Bachelors, Masters and Doctorate and I assume this would be easier when comparing the 'education-num' rather than logical operators on string. But if there's any way we could do it directly from the 'education' that would also be helpful.

Edit: Turns out the Dataframe.isin helps to select rows by the list of string as given in the solution here.)

P.S. stack-overflow didn't allow me to post the image directly and posted a link to it instead...

HoppyHOP
  • 19
  • 3
  • 1
    the code you provided resulted same as `df[['education','education-num']]` with the only difference of removing the column name – HoppyHOP Jun 10 '21 at 05:53

1 Answers1

0

Select columns by subset and call DataFrame.drop_duplicates:

df1 = df[['education', 'education-num']].drop_duplicates()

If need count population use:

df2 = df.groupby(['education', 'education-num']).size().reset_index(name='count')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252