Questions tagged [drop-duplicates]

questions related to removing (or dropping) unwanted duplicate values

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about removing unwanted duplicates.

How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas?

I have DataFrame in Python Pandas like below: data types: ID - int TYPE - object TG_A - int TG_B - int ID TYPE TG_A TG_B 111 A 1 0 111 B 1 0 222 B 1 0 222 A 1 0 333 B 0 1 333 A 0 1 And I need to drop duplicates in above…

asked Dec 12 '22 at 14:59

dingaro

2,156
9
29

votes

2 answers

Pandas multiindex duplicated only for particular indices

Say I have a Pandas dataframe with multiple indices: arrays = [["UK", "UK", "US", "FR"], ["Firm1", "Firm1", "Firm2", "Firm1"], ["Andy", "Peter", "Peter", "Andy"]] idx = pd.MultiIndex.from_arrays(arrays, names = ("Country", "Firm", "Responsible")) df…

python pandas multi-index drop-duplicates

asked Mar 28 '22 at 09:29

W. Walter

votes

2 answers

Summing values of (dropped) duplicate rows Pandas DataFrame

For a time series analysis, I have to drop instances that occur on the same date. However, keep some of the 'deleted' information and add it to the remaining 'duplicate' instance. Below a short example of part of my dataset. z =…

python pandas dataframe drop-duplicates

asked Mar 31 '21 at 13:11

John van de Ven

votes

2 answers

Remove duplicates from all columns with condition value>0 in columns in data frame

I need to remove duplicates from all of the columns. My data: id country publisher weak A B C 123 US X 1 6.77 0 0 123 US X 1 0 1.23 88.7 456 BZ Y …

python pandas drop-duplicates

asked Aug 12 '20 at 13:03

Lili

votes

2 answers

Drop almost duplicates rows based on timestamp

I'm trying to remove some data almost duplicates. I'm looking for a way to detect the closest (edited_at) trip made by the user without losing informations. So I want to solve this problem by calculating the difference between succesive timestamps…

pandas pandas-groupby drop-duplicates

asked Jul 16 '20 at 19:09

Adil Blanco

votes

2 answers

Python Dataframe: Dropping duplicates base on certain conditions

Dataframe with duplicate Shop IDs where some Shop IDs occurred twice and some occurred thrice: I only want to keep unique Shop IDs base on the shortest Shop Distance assigned to its Area. Area Shop Name Shop Distance Shop ID 0 AAA Ly …

python pandas dataframe drop-duplicates

asked Dec 03 '19 at 04:00

est

votes

2 answers

How to drop duplicate rows based on values of two columns?

I have a data frame like this: Category Date_1 Score_1 Date_2 Score_2 A 13/11/2019 5 13/11/2019 10 A 13/11/2019 5 14/11/2019 55 A 13/11/2019 5 15/11/2019 45 …

python pandas row data-science drop-duplicates

asked Nov 21 '19 at 06:49

Sara

votes

0 answers

Pandas drop_duplicates -> Fatal Python error: deallocating None

I have a code that checks an Excel sheet, and if it finds some changes, then takes a snapshot (Pandas Dataframe) of the whole sheet and saves it to a csv with a timestamp. It has been running all day doing it's job correctly, but usually once or…

python pandas drop-duplicates

asked Apr 05 '19 at 20:06

user10466538

votes

1 answer

Drop all group rows when met a condition?

I have pandas data frame have two-level group based on 'col10' and 'col1'.All I want to do is, drop all group rows if a specified value in another column repeated or this value did not existed in the group (keep the group which the specified value…

pandas pandas-groupby drop-duplicates

asked Mar 31 '19 at 15:00

Sidhom

votes

1 answer

pandas inclusive unique values from two columns

I can't find any elegant way to select unique rows from column A and column B but not jointly and not in a sequence. This is in order to keep "inclusive" intersection of unique values from these two columns. My aim is to keep as many unique values…

python pandas filter unique drop-duplicates

asked Mar 15 '19 at 12:45

Alex

votes

1 answer

Custom logic for dropping duplicates

I have the following dataset that I'm hoping to apply some custom logic to: data = pd.DataFrame({'ID': ['A','B','B','C','C','D','D'], 'Date':…

python pandas drop-duplicates

asked Jul 19 '18 at 15:22

Dfeld

vote

0 answers

In PySpark, how do I avoid an error when using exceptAll after a dropDuplicates (with subset)?

I am working on a sequence of transformations in PySpark (version 3.3.1). At certain point I have a dropDuplicates(subset=[X]) followed by a exceptAll, and I get an error. Here is a reproducible pipeline: from pyspark.sql import SparkSession spark…

pyspark drop-duplicates

asked Jul 28 '23 at 21:58

PMHM

vote

1 answer

Python Pandas groupby agg or transform on max value_counts to drop duplicate rows

I have this df, and want to drop duplicates based on the max value counts of 'rating' (its binary field). None of the drop_duplicates with combination of grouby, max, count isn't fecthing the desired output. Any suggestion highly appreciated. df =…

python pandas dataframe group-by drop-duplicates

asked May 09 '23 at 14:22

Chetan Govindaswamy

vote

3 answers

Dropping duplicate rows in a Pandas DataFrame based on multiple column values

In a dataframe I need to drop/filter out duplicate rows based the combined columns A and B. In the example DataFrame A B C D 0 1 1 3 9 1 1 2 4 8 2 1 3 5 7 3 1 3 4 6 4 1 4 5 5 5 1 4 6 4 rows…

python pandas drop-duplicates

asked Mar 29 '23 at 16:21

leofer

vote

3 answers

Remove duplicate values across columns in pandas dataframe, without removing entire row

I would like to drop all values which are duplicates across a subset of two or more columns, without removing the entire row. Dataframe: A B C 0 foo g A 1 foo g G 2 yes y B 3 bar y B Desired result: A B C 0 foo g …

python pandas dataframe drop-duplicates

asked Mar 16 '23 at 02:09

btroppo

Prev 1

…

9 10 Next