Questions tagged [drop-duplicates]

questions related to removing (or dropping) unwanted duplicate values

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about removing unwanted duplicates.

See also:

144 questions
0
votes
2 answers

Dropping same rows in two pandas dataframe in python

I want to have uncommon rows in two pandas dataframes. Two dataframes are df1 and wildone_df. When I check their typy both of them are "pandas.core.frame.DataFrame" but when I use below mentioned code to omit their intersection: o =…
0
votes
2 answers

Remove duplicate rows based on values in every column using pandas

I have a pandas df of different permutations of values: (toy version below, but my actual df contains more columns and rows) My goal is to remove the rows that contain duplicate values across rows but critically with also checking all…
psychcoder
  • 543
  • 3
  • 14
0
votes
0 answers

returns me an error for dataframe with more than 100,000 rows

My dataframe has more than 100,00 rows and when I run the code, df_new['SvcAdd.Type'] = df_new.groupby(['Routing'])['SvcAdd.Type'].transform(lambda x : ' '.join(x)) df_new = df_new.drop_duplicates() df_new it returns me the following error. I am…
0
votes
0 answers

Remove duplicates from dataframe, based on two columns A,B, keeping [list of values] in another column C

I have a pandas dataframe which contains duplicates values according to two columns (A and B): A B C 1 2 1 1 2 4 2 7 1 3 4 0 3 4 8 I want to remove duplicates keeping the values in column C inside a list of len N values in C (example 2 values in…
0
votes
1 answer

Error while Dropping Duplicate Rows in MySQL table

--Table Schema CREATE TABLE Test( ID INT, FirstName Varchar(100), LastName Varchar(100), Country Varchar(100) ); Insert into Test (FirstName,LastName,Country)values('Raj','Gupta','India'), …
0
votes
2 answers

How to remove duplicates and keep values of all columns

I have a df like below Date ID Colour ColourCode Item ItemCode 0 2020-01-02 245 Blue Bl Apple NaN 1 2020-01-02 245 Blue NaN Apple Ap 2 2020-01-03 245 Blue Bl Orange NaN 3 2020-01-03 …
Shichimi
  • 71
  • 8
0
votes
4 answers

Most memory efficient way to remove duplicate lines in a text file using C++

I understand how to do this using std::string and std::unordered_set, however, each line and each element of the set takes up a lot of unnecessary, inefficient memory, resulting in an unordered_set and half the lines from the file being 5 -10 times…
0
votes
4 answers

Is there a function to remove duplicates within a row without removing the entire row using Python?

import pandas as pd data=[["John","Alzheimer's","Infection","Alzheimer's"],["Kevin","Pneumonia","Pneumonia","Tuberculosis"]] df=pd.DataFrame(data,columns=['Name','Problem1','Problem2','Problem3']) In this data frame, I would like to read through…
dawgtor
  • 11
  • 1
0
votes
1 answer

Get values of latest year and all its months in pandas

Below is the Raw Data. Event Month Year Event1 January 2012 Event1 February 2013 Event1 March 2014 Event1 April 2017 Event1 May 2017 Event1 June 2017 Event2 May …
Prime coder
  • 91
  • 1
  • 10
0
votes
0 answers

In Python How to remove duplicates from a list/string using only for loop, if else statements and without using empty list, set function

I want to remove duplicates from a list by only using for and if else statements, Without using set, enumerate functions, fromkeys and without using new empty array/list. So in the given list, I wish to keep an item only if it appears one time, if…
Kunal
  • 1
0
votes
1 answer

I can't save the cleaned df to target directory

I am trying to remove duplicates from large files, but save those into a different directory. I ran the code below, but it saved them (overwrote) within the root directory. I know that if I switch to inplace='False' it won't overwrite those files in…
0
votes
1 answer

Two DELETE statements in Oracle to delete duplicates

We have a table with over 55k rows that have an identifying name duplicated. The name can vary and the number of duplicates with each name can vary. So I applied these 2 scripts for practice deleting duplicate records from a table. Is there a…
0
votes
1 answer

DataFrame drop Method is dropping all rows despite selecting subset

I have a df of invoices, but only the following two columns really matter OrderNum Id . . . . 586 270 588 270 590 270 590 270 Where OrderNum is int64 and Id is also int64 I am trying to drop duplicates Order Numbers but for…
S44
  • 473
  • 2
  • 10
0
votes
0 answers

Drop duplicates based on two columns with an OR condition

I have an ordered table with multiple ambiguous relationships between old and new contracts. Old Contract New Contract C1 C3 C1 C4 C2 C3 C2 C4 The goal is to have unambiguous relationships, i.e. each old contract should only be…
Jan
  • 1
  • 1
0
votes
1 answer

Dropping duplicate rows in data frame with pandas df.drop(), not df.drop_duplicates

all - I have been running in circles with this code. I have a data frame with data for 2018, 2019, 2020, and 2021. Sometimes there are duplicate rows, but since the index is different, pd.drop_duplicates does not work and after troubleshooting for a…