How to delete rows which are condition on an attribute from a dataframe keeping first row of each category?

Question

I have a dataframe called housing. One of the attributes of dataframe is "Ocean_proximity" which categorical attribute. I applied a condition of median_house_value to be 450k on a dataframe. Now here, I want to keep only one record on each category of "Ocean_proximity" and delete all other records.

I am using pandas and python3.0 '''

>>>housing[housing.median_house_value==450000][['median_income','median_house_value','ocean_proximity']]

>>> 
     median_income  median_house_value ocean_proximity
993           6.1023            450000.0          INLAND
4265          1.7306            450000.0       <1H OCEAN
4623          0.8804            450000.0       <1H OCEAN
4676          5.8632            450000.0       <1H OCEAN
4685          3.6111            450000.0       <1H OCEAN
4717          2.7824            450000.0       <1H OCEAN
5427          2.2402            450000.0       <1H OCEAN
5506          3.6667            450000.0       <1H OCEAN
5890          4.0893            450000.0       <1H OCEAN
6555          7.7108            450000.0          INLAND
8314          2.1579            450000.0          ISLAND
8317          2.7361            450000.0          ISLAND

>>>housing
>>>  
  median_income  median_house_value ocean_proximity
993           6.1023            450000.0          INLAND
4265          1.7306            450000.0       <1H OCEAN
8317          2.7361            450000.0          ISLAND

Thanks. But I also want it to reflect in my original dataframe "hosuing" — Deadpool, May 22 '19 at 20:45

score 1 · Answer 1 · answered May 22 '19 at 19:05

1

Simplest way is to pass the single column to the drop_duplicates function

df.drop_duplicates('ocean_proximity')

        median_income   median_house_value  ocean_proximity
993     6.1023  450000.0    INLAND
4265    1.7306  450000.0    <1H_OCEAN
8314    2.1579  450000.0    ISLAND

answered May 22 '19 at 19:05

G. Anderson

5,815
2
14
21

'''housing[housing.median_house_value==450000].drop_duplicates('ocean_proximity', inplace=True)''' ....I am using above line of code but it is giving me a Settingwithcopywarning: "A value is trying to be set on a copy of a slice from a DataFrame " – Deadpool May 22 '19 at 19:39

score 0 · Answer 2 · answered May 22 '19 at 19:02

We can use pandas groupby and apply to group the rows by ocean proximity and keep only the first element.

df=housing[housing.median_house_value==450000][['median_income','median_house_value','ocean_proximity']]
housing=df.groupby('ocean_proximity').apply(lambda x: x.iloc[0])

How to delete rows which are condition on an attribute from a dataframe keeping first row of each category?

2 Answers2