Questions tagged [drop-duplicates]

questions related to removing (or dropping) unwanted duplicate values

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about removing unwanted duplicates.

See also:

144 questions
1
vote
2 answers

problem with pandas drop_duplicates removing empty values

Im using drop_duplicates to remove duplicates from my dataframe based on a column, the problem is this column is empty for some entries and those ended being removed to is there a way to make the function ignore the empty value. here is an example …
1
vote
1 answer

How to join two rows that have the same keys and complementary values

My goal is to collapse the below table into one single column and this question deals specifically with the blue row below. The table has three categorical variables and 6 analysis/quantitative variables. Columns C1 and C2 are the only variables…
mcbridecaleb
  • 101
  • 1
  • 8
1
vote
1 answer

Dropna when another row has the missing data OR drop_duplicates with NaN matching all data

I have data like the following: Index ID data1 data2 ... 0 123 0 NaN ... 1 123 0 1 ... 2 456 NaN 0 ... 3 456 NaN 0 ... ... I need to drop rows which have less than or equal to the…
Isaac
  • 361
  • 5
  • 18
1
vote
1 answer

Pandas drop_duplicates not working consistently between Jupyter notebook and python script

I am adding entries to an existing dataframe, where they can be new or updates to existing in the dataframe. Older and outdated entries will be deleted from the dataframe by using Pandas drop_duplicates, which worked as expected in Jupyter…
griffinleow
  • 93
  • 12
1
vote
3 answers

In pandas how to use drop_duplicates with one exception?

In python 3 and pandas I need to eliminate duplicate rows from a dataframe by repeating values in a column. For this I used: consolidado = df_processos.drop_duplicates(['numero_unico'], keep='last') The column "numero_unico" has codes in string…
Reinaldo Chaves
  • 965
  • 4
  • 16
  • 43
1
vote
1 answer

Dropping same words in reverse order as duplicates using Spark Dataframe

I am able to successfully drop duplicates using Spark Dataframe method dropDuplicates which considers 100% match in exact order as duplicate. So for example if we have two "red toys", one of them is considered duplicate and gets filtered out. Now…
Anand
  • 20,708
  • 48
  • 131
  • 198
1
vote
1 answer

I can't figure out why I can't remove duplicates from a Pandas df

I am trying to update a Pandas Dataframe with data from an API and have it written to .csv, I need to be sure it does not contain duplicate rows. I have been checking on here to see what the problem might be (for example forgetting to add…
1
vote
2 answers

using duplicates values from one column to remove entire row in pandas dataframe

I have the data in the .csv file uploaded in the following link Click here for the data In this file, I have the following columns: Team Group Model SimStage Points GpWinner GpRunnerup 3rd 4th There will be duplicates in the columns…
Zephyr
  • 1,332
  • 2
  • 13
  • 31
1
vote
1 answer

Looking for an analogue to pd.DataFrame.drop_duplicates() where order does not matter

I would like to use something similar to dropping the duplicates of a DataFrame. I would like columns' order not to matter. What I mean is that the function shuold consider a row consisting of the entries 'a', 'b' to be identical to a row consisting…
splinter
  • 3,727
  • 8
  • 37
  • 82
0
votes
1 answer

Extract String from duplicate row, remove duplicate, give count of strings

I'm relatively new to Python/panda. Here is my problem: I have a df looking like this: df = pd.DataFrame({ 'ZIP Code': ['1234','1234', '5678', '9101'], 'City Name': ['City A', 'City A', 'City B', 'City C'], 'Newspaper': ['City A News',…
0
votes
0 answers

Python Logger In "for" Loop Producing Duplicate Outputs

I've created a function (part of a created class) which accepts the name of a pdf file then extracts its contents. Inside that function, I've placed a logger which will be sent to a file and the console to display what the current pdf is and if any…
0
votes
2 answers

DolphinDB: Find records only exist in one table but not other

I have two tables, new (shared table) and old (regular in-memory table), each with less than 10,000 rows. Both tables contain columns “a“ and “b“, and values in the "a" columns are unique. I want to compare the two columns to find records that only…
lulunolemon
  • 219
  • 3
0
votes
0 answers

Clean duplicates based on multiple conditions

I have a df of fruit purchases sorted by date. I want to drop duplicates by fruit. But the way to drop duplicates depend on the column. The solution needs to generalise to more columns. But the 3 types of operations remain the same: For each…
asd
  • 1,245
  • 5
  • 14
0
votes
1 answer

Pandas dataframe: drop_duplicates after converting to str compares truncated strings, not actual contents

I tried the suggestion in this answer, and it appears that the conversion to string before dropping duplicates results in the truncated representation being compared. It seems to me that the dataframe.astype(str) already has this truncation. How do…
palongsag
  • 23
  • 4
0
votes
1 answer

Finding duplicates across multiple sheets in an excel corresponding to the first column using python

I have an excel file with multiple sheets containing when and where employees went for a sale. The columns are different in all sheets. Example Sheet 1 Date/Place PlaceA PlaceB PlaceD PlaceE PlaceF PlaceG 2019-03-01 A B …
Zoey Nightshade
  • 126
  • 2
  • 5