Questions tagged [duplicates]

The "duplicates" tag concerns detecting and/or dealing with multiple instances of items in collections.

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about preventing, detecting, removing, or otherwise dealing with unwanted duplicates, or adapting to safely allow duplicates.

15777 questions
71
votes
5 answers

Python copy files to a new directory and rename if file name already exists

I've already read this thread but when I implement it into my code it only works for a few iterations. I'm using python to iterate through a directory (lets call it move directory) to copy mainly pdf files (matching a unique ID) to another…
GISHuman
  • 1,026
  • 1
  • 8
  • 27
70
votes
10 answers

Scala: Remove duplicates in list of objects

I've got a list of objects List[Object] which are all instantiated from the same class. This class has a field which must be unique Object.property. What is the cleanest way to iterate the list of objects and remove all objects(but the first) with…
parsa
  • 2,628
  • 3
  • 34
  • 44
70
votes
9 answers

MySQL remove duplicates from big database quick

I've got big (>Mil rows) MySQL database messed up by duplicates. I think it could be from 1/4 to 1/2 of the whole db filled with them. I need to get rid of them quick (i mean query execution time). Here's how it looks: id (index) | text1 | text2 |…
bizzz
  • 1,795
  • 2
  • 19
  • 21
69
votes
2 answers

RoR nested attributes produces duplicates when edit

I'm trying to follow Ryan Bates RailsCast #196: Nested model form part 1. There're two apparent differences to Ryans version: 1) I'm using built-in scaffolding and not nifty as he's using, and 2) I'm running rails 4 (I don't really know what version…
conciliator
  • 6,078
  • 6
  • 41
  • 66
67
votes
4 answers

Remove sequentially duplicate frames when using FFmpeg

Is there any way to detect duplicate frames within the video using ffmpeg? I tried -vf flag with select=gt(scene\,0.xxx) for scene change. But, it did not work for my case.
metlira
  • 873
  • 1
  • 8
  • 12
66
votes
9 answers

Removing elements that have consecutive duplicates

I was curios about the question: Eliminate consecutive duplicates of list elements, and how it should be implemented in Python. What I came up with is this: list = [1,1,1,1,1,1,2,3,4,4,5,1,2] i = 0 while i < len(list)-1: if list[i] ==…
Trufa
  • 39,971
  • 43
  • 126
  • 190
66
votes
9 answers

spark dataframe drop duplicates and keep first

Question: in pandas when dropping duplicates you can specify which columns to keep. Is there an equivalent in Spark Dataframes? Pandas: df.sort_values('actual_datetime', ascending=False).drop_duplicates(subset=['scheduled_datetime',…
ad_s
  • 1,560
  • 4
  • 15
  • 16
66
votes
6 answers

How to check if exists any duplicate in Java 8 Streams?

In java 8, what's the best way to check if a List contains any duplicate? My idea was something like: list.size() != list.stream().distinct().count() Is it the best way?
pedrorijo91
  • 7,635
  • 9
  • 44
  • 82
65
votes
2 answers

df.unique() on whole DataFrame based on a column

I have a DataFrame df filled with rows and columns where there are duplicate Id's: Index Id Type 0 a1 A 1 a2 A 2 b1 B 3 b3 B 4 a1 A ... When I use: uniqueId = df["Id"].unique() I get a list of unique…
WJA
  • 6,676
  • 16
  • 85
  • 152
64
votes
19 answers

How can I delete one of two perfectly identical rows?

I am cleaning out a database table without a primary key (I know, I know, what were they thinking?). I cannot add a primary key, because there is a duplicate in the column that would become the key. The duplicate value comes from one of two rows…
lofidevops
  • 15,528
  • 14
  • 79
  • 119
62
votes
5 answers

Find all duplicate documents in a MongoDB collection by a key field

Suppose I have a collection with some set of documents. something like this. { "_id" : ObjectId("4f127fa55e7242718200002d"), "id":1, "name" : "foo"} { "_id" : ObjectId("4f127fa55e7242718200002d"), "id":2, "name" : "bar"} { "_id" :…
frazman
  • 32,081
  • 75
  • 184
  • 269
62
votes
8 answers

How to remove duplicates based on a key in Mongodb?

I have a collection in MongoDB where there are around (~3 million records). My sample record would look like, { "_id" = ObjectId("50731xxxxxxxxxxxxxxxxxxxx"), "source_references" : [ "_id" :…
user1518659
  • 2,198
  • 9
  • 29
  • 40
61
votes
15 answers

How to remove duplicates from a list?

I want to remove duplicates from a list but what I am doing is not working: List listCustomer = new ArrayList(); for (Customer customer: tmpListCustomer) { if (!listCustomer.contains(customer)) { …
Mercer
  • 9,736
  • 30
  • 105
  • 170
61
votes
14 answers

Fastest way to remove duplicate documents in mongodb

I have approximately 1.7M documents in mongodb (in future 10m+). Some of them represent duplicate entry which I do not want. Structure of document is something like this: { _id: 14124412, nodes: [ 12345, 54321 ], …
ewooycom
  • 2,651
  • 5
  • 30
  • 52
60
votes
20 answers

Delete duplicate records from a SQL table without a primary key

I have the below table with the below records in it create table employee ( EmpId number, EmpName varchar2(10), EmpSSN varchar2(11) ); insert into employee values(1, 'Jack', '555-55-5555'); insert into employee values (2, 'Joe',…
Shyju
  • 214,206
  • 104
  • 411
  • 497