Questions tagged [drop-duplicates]

questions related to removing (or dropping) unwanted duplicate values

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about removing unwanted duplicates.

See also:

144 questions
0
votes
1 answer

Removing duplicates between two workbooks

I need help removing duplicates between two workbooks: "Master" workbook and "Copy" workbook. I'm looking for values in these columns that are duplicates: Copy Column D = Master Column A Copy Column O = Master Column C Copy Column R = Master…
0
votes
0 answers

Python - Remove duplicate from dataframe for specific values stored in list

I am working with a dataset where the dataframe contains mulitple duplicates for entries. The entries whose duplicates I need to remove are stored in a list. I can't seem to find a way to remove the duplicates in the dataframe. The methods I have…
Buddy
  • 1
0
votes
2 answers

What is a more efficient way to remove duplicates from a CSV file based on specific fields using a batch script, (and gawk, if needed)?

I have two csv documents that contain lists of files from a source and destination in Google Drive generated by GAM. One is called copytoarchive.csv and lists all relevant files in the source. The other is alreadyinarchive.csv and lists all relevant…
0
votes
1 answer

drop nearly duplicates (pandas)

I have a dataframe with three columns: 'id', 'subject', 'delta', I would like a function that considers lines where id and subject are repeated as duplicates, but delta, which is an integer, can be considered as duplicates if the difference between…
rafa.mf_
  • 3
  • 1
0
votes
2 answers

Remove float duplicates from a list of tuples created by zip

I create a list of tuples by zipping three lists together, data pairs: XYZip = list(zip(XaData, Y1aData, Y2aData)) [ (0.001625625, 4.782947316198166, -0.011032947316198166), (-2.5e-06, 4.783447358402665, 0.020216552641597337), …
casandra9
  • 1
  • 1
0
votes
1 answer

postgresql INSERT INTO all columns from a table

I am trying to write a method that removes duplicates from tables, without having to know the details of the table for generality (i.e., it should run on any table). I am using the following method from here (last method) through psycopg2: CREATE…
Aaron Bramson
  • 1,176
  • 3
  • 20
  • 34
0
votes
1 answer

drop_duplicates not dropping the duplicate records of the same dtype object

I have following dataframe: DF1: col1 | col2 | col3 1 2 3 4 5 6 40 50 60 when I print the dtypes of this columns, all of them are objects. Now, I want to add new row(input as dataframe), so I…
Jay Patel
  • 49
  • 4
0
votes
2 answers

Duplicated float values in pandas even after drop it

I have a column with float values which is so strange, because even if I set type of variable and dropped duplicated, I have still duplicated values. I put the print screen with code and strange result. I tried using different types of variable and…
0
votes
0 answers

Drop_duplicates + groupby -->TypeError: sequence item 0: expected str instance, int found

My ex-colleague wrote a code which imports an excel file and makes some changes on it. During the process we started receiving such an error. Do you have any idea how I can fix it? Here is the problematic part of the code.. ### Concat LI related…
0
votes
0 answers

How can a duplicate row be dropped with some condition

I have a DF that looks like the following table Name Year Alice 2019 Bob 2020 John 2021 Bob 2022 I would like for each unique 'Name' to check which 'Year' is higher and drop the row with the lower 'Year'. For example can I drop the…
0
votes
1 answer

How to drop_duplicates in python

I have to compare to csv files, which I need to drop the duplicate rows and generate another file. #here I´m comparing the csv files. The oldest_file and the newest_file different_data_type = newest_file.equals(other = oldest_file) #If they have…
Matheus
  • 13
  • 3
0
votes
2 answers

Using `drop_duplicates` on a Pandas dataframe isn't dropping rows

Situation I have dataframe similar to below ( although I've removed many of the rows for this example, as evidenced in the 'index'…
dsx
  • 167
  • 1
  • 12
0
votes
2 answers

Pandas Drop Duplicates And Store Duplicates

i use the pandas.DataFrame.drop_duplicates to search duplicates in a dataframe. This removes the duplicates from the dataframe. This also works great. However, I would like to know which data has been removed. Is there a way to save the data in a…
0
votes
1 answer

How to group by first column, select latest value of second column, and all respective values of third column

I have a df: {'ID': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B', 5: 'B', 6: 'C', 7: 'C', 8: 'C', 9: 'C'}, 'Date': {0: Timestamp('2020-03-02 00:00:00'), 1: Timestamp('2021-04-03 00:00:00'), 2: Timestamp('2021-04-03 00:00:00'), …
Shichimi
  • 71
  • 8
0
votes
1 answer

How do I drop only contiguous rows (all but one) in a pandas DataFrame according to column values?

I have a DataFrame that looks like this: Column1 Column2 0 cat A 1 cat B 2 cat C 3 dog D 4 dog E 5 cat F I want to drop all but one of the contiguous rows where Column 1 has…