0

I want to drop duplicate values on "Nit", but keeping when date "Date" is 31-12-2018

Nit       sales    date

12345      56    31-12-2018
12345      45    31-06-2018
23346      87    31-12-2018
76553      93    31-12-2018
44556      34    31-06-2018
44556      52    31-12-2018
jimmy
  • 340
  • 2
  • 13

1 Answers1

0

Let's try:

(df.assign(valid_date=df['date']=='31-12-2018')
   .sort_values('valid_date', ascending=False)
   .drop_duplicates('Nit')
   .sort_index()
   .drop('valid_date', axis=1)
)

Output:

     Nit  sales        date
0  12345     56  31-12-2018
2  23346     87  31-12-2018
3  76553     93  31-12-2018
5  44556     52  31-12-2018

Note: A simple

df[df['date']=='31-12-2018']

may do what you want.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • woud the second way modify the entire df? – jimmy Jul 01 '20 at 04:55
  • no, it slices a part of `df`. You can replace with `df = df[...]` or assign to a new dataframe `new_df = df[...].copy()`. – Quang Hoang Jul 01 '20 at 04:56
  • ins't there a way to set a personalized function to keep : "df.drop_duplicates(subset ="Nit", keep = "personalize function", inplace = True)" that personalized function would the values with the specific "data" – jimmy Jul 01 '20 at 05:12