Questions tagged [drop-duplicates]

questions related to removing (or dropping) unwanted duplicate values

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about removing unwanted duplicates.

See also:

144 questions
0
votes
1 answer

Why PySpark dropDuplicates and Join gives ODD results

PySpark gives me little odd results after dropDuplicates and join data-sets. The situation is there are two very large dataset: one with people's ID and some variables and second one with their region_code first…
default_settings
  • 440
  • 1
  • 5
  • 10
-1
votes
1 answer

Remove duplicate rows only if the page column is not the same

I am super new to python. i am able to remove duplicate rows, but I need to only remove rows if they are from different pages. this is my…
Bstat
  • 1
  • 2
-1
votes
1 answer

Delete duplicates based on varchar in another row. Eliminate terminated staff from active roster

I want to create a roster that shows only active staff. All staff are listed in a table, active staff with no termination have one row in the table. Terminated staff have two rows of data. Employee status is wages or term. How can I get one row…
Clay
  • 1
  • 1
-1
votes
1 answer

Search for duplicates in each row and return which column has the duplicate?

So I have poll data that I am looking at and I've been trying to create a script in R for it. Column 1 is the voter's name. The rest of the columns are the names of the people they voted for, across different category. I have 70 voters so I have 70…
-1
votes
1 answer

Dropping duplicates only if found twice

I have a dataframe with claim numbers, which is an 12 digit number. I am trying to take out reversed claims, which would be 2 claims of a paid claim and reversed claim. There are instances where a claim was processed and reversed, but then it was…
Chris
  • 1
  • 1
-1
votes
1 answer

store duplicated rows while comparing two dataframes in panda

hello people (I am new to python) Question: I have 2 dataframes df1 and df2, I want to check if there's duplicates based on same (url, price, pourcent) then store them in new datframe also check if there's duplicated url but price change and store…
Eya Mila
  • 41
  • 4
-2
votes
2 answers

Removing duplicate words from string

I have a string like ABA AAB BAA BAA ABA AAB. I want to remove duplicate words and thus get the output ABA AAB BAA. However, when I run the code below, the output is ABA AAB BAA BAA ABA AAB: // I'm continuously pushing the string int S = a.size()…
-2
votes
4 answers

Python: Drop duplicate element within nested list, if element is an element within another nested list

How do we deduplicate elements within nested lists based on the elements within another nested list? Or, does it make more sense to iterate through a column and drop duplicates based on a list of elements in another column? Column 1 R1 = [foo, bar,…
NJT
  • 21
  • 2
-2
votes
2 answers

drop_duplicates() stopped working in Python pandas

This code had previously worked in python 3 to remove the duplicate values but keep first occurrence across an entire dataframe. After coming back to my script this no longer removes duplicates in a pandas dataFrame. df = df.apply(lambda x:…
C_psy
  • 647
  • 8
  • 22
1 2 3
9
10