Questions tagged [drop-duplicates]

questions related to removing (or dropping) unwanted duplicate values

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about removing unwanted duplicates.

See also:

144 questions
1
vote
3 answers

Pandas drop_duplicates() without empty rows

I have 2 equal columns in a pandas data frame. Each of the columns have the same duplicates. A B 1 1 1 1 2 2 3 3 3 3 4 4 4 4 I want to delete the duplicates only from column B so that the goal is like the following: A B 1 1 1 2 2 3 3 4 3 4 4 I…
David95
  • 35
  • 5
1
vote
1 answer

Why is unique() not picking up some identical rows?

I am working with a large dataset of event logs that looks something like this: time user_id place key version 2023-02-13 06:28:54 30375 School 422i-dmank-ev2eia 2.023 2023-02-13 06:24:42 47127 School wjes-wtpi-byt2rl0 2.023 2023-02-13…
gribbling
  • 11
  • 2
1
vote
1 answer

Python Drop duplicates to ignore case sensitive

I want to delete duplicates from the below df, preserving the case sensitivity. Input df df = pd.DataFrame({ 'company_name': ['Apple', 'apple','apple', 'BlackBerry', 'blackberry','Blackberry'] }) Expected df company_name 0 …
1
vote
1 answer

Using Python, how do I remove duplicates in a PANDAS dataframe column while keeping/ignoring all 'nan' values?

I have a dataframe like this: import pandas as pd data1 = { "siteID": [1, 2, 3, 1, 2, 'nan', 'nan', 'nan'], "date": [42, 30, 43, 29, 26, 34, 10, 14], } df = pd.DataFrame(data1) But I want to delete any duplicates in siteID, keeping…
Bojan Milinic
  • 93
  • 1
  • 1
  • 7
1
vote
2 answers

How not to add duplicate elements in the DOM

I know it is a very easy question but I am still struggling. I created a function which adds element in an Array and after that I am using forEach loop for appending them in the DOM. But I am unable to prevent addition of duplicate elements. const…
Kunal Tanwar
  • 1,209
  • 1
  • 8
  • 23
1
vote
4 answers

How to remove duplicate rows with a condition in pandas

i.e i want to drop duplicates pairs using col1 and col2 as the subset only if the values are the opposite in col3 (one negative and one positive). similar to drop_duplicates function but i want to impose a condition and only want to remove the first…
bbaba
  • 11
  • 3
1
vote
1 answer

remove-duplicates produces numbers that haven't been in the original list in NetLogo

I'm creating a list out of the patch variable "geb-id" (a 7-digit integer number) with the following line: set geb-id-list [geb-id] of patches with [geb-id >= 0 AND residents != 0] When I look at the produced list, all looks fine. Then I'm…
misch
  • 11
  • 3
1
vote
1 answer

Drop duplicates when for a group a string present more than once in a column-pandas

Is there a way to groupby based on 2 columns (Id, Name) in a dataframe and if the presence of a certain string "x_1" in the column "Name" is more than once, then just keep the first row (first occurrence)? Id Name Value 1 x_1 23 1 x_2 24 1 x_1 …
1
vote
1 answer

Pandas drop_duplicates returns NoneType when inplace=True and doesn't drop duplicates when inplace=False

I want to remove duplicates from certain dataframe. When I have inplace=True it in fact removes the duplicates and returns NoneType dataframe. When I set inplace=False the dataframe is not modified even when I assign new variable for it. This works…
Jaxsss
  • 41
  • 3
1
vote
1 answer

Pandas dataframe drop duplicates based in another column value

I have a dataframe with duplicates: timestamp id ch is_eval. c 12. 1. 1. False. 2 13. 1. 0. False. 1 12. 1. 1. True. 4 13. 1 0. False. 3 When there are duplicated, it is always when I want to drop_duplicates with…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
1
vote
1 answer

Want to drop duplicate based on one column but want to keep first two rows

Hi I am droping duplicate from dataframe based on one column i.e "ID", Till now i am droping the duplicate and keeping the first occurence but I want to keep the first(top) two occurrence instead of only one. So I can compare the values of first two…
1
vote
1 answer

pandas drop_duplicates works but when saved using .to_csv it still shows all

I'm simply trying to remove duplicates from a csv and then make a new csv file with only the first column and no duplicates. My terminal shows its working but when then the new csv file still shows all. ??? import pandas as pd import numpy as…
Pheng Vue
  • 11
  • 1
1
vote
1 answer

drop_duplicates in pandas for a large data set

I am new to pandas so sorry for naiveté. I have two dataframe. One is out.hdf: 999999 2014 1 2 15 19 45.19 14.095 -91.528 69.7 4.5 0.0 0.0 0.0 603879074 999999 2014 1 2 23 53 57.58 16.128 -97.815 23.2 4.8 0.0 0.0 0.0…
1
vote
2 answers

Python Pandas : Drop Duplicates Function - Unusual Behaviour

The error -> TypeError: unhashable type: 'list' disappears after saving the data frame and loading it again ... Both data frames [saved and loaded, generated] have the same dtypes ... Reproducible -> --> import pandas as pd --> l1 = [[1], [1], [1],…
Agnij
  • 561
  • 3
  • 13
1
vote
2 answers

Drop Duplicate Rows Based on Target Class Conditions

I have a dataset with 3 target classes: ‘Yes’, ‘Maybe’, and ‘No’. Unique_id target 111 Yes 111 Maybe 111 No 112 No 112 Maybe 113 No I want to drop duplicate rows…
Roy
  • 924
  • 1
  • 6
  • 17
1 2
3
9 10