Questions tagged [duplicates]

The "duplicates" tag concerns detecting and/or dealing with multiple instances of items in collections.

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about preventing, detecting, removing, or otherwise dealing with unwanted duplicates, or adapting to safely allow duplicates.

15777 questions
136
votes
20 answers

Map implementation with duplicate keys

I want to have a map with duplicate keys. I know there are many map implementations (Eclipse shows me about 50), so I bet there must be one that allows this. I know it's easy to write your own map that does this, but I would rather use some existing…
IAdapter
  • 62,595
  • 73
  • 179
  • 242
132
votes
7 answers

How to find duplicates in 2 columns not 1

I have a MySQL database table with two columns that interest me. Individually they can each have duplicates, but they should never have a duplicate of BOTH of them having the same value. stone_id can have duplicates as long as for each upsharge…
JD Isaacks
  • 56,088
  • 93
  • 276
  • 422
127
votes
9 answers

How to find all duplicate from a List?

I have a List which has some words duplicated. I need to find all words which are duplicates. Any trick to get them all?
Steven Spielberg
120
votes
17 answers

Java: Detect duplicates in ArrayList?

How could I go about detecting (returning true/false) whether an ArrayList contains more than one of the same element in Java? Many thanks, Terry Edit Forgot to mention that I am not looking to compare "Blocks" with each other but their integer…
Terry
112
votes
4 answers

ValueError: Length of values does not match length of index | Pandas DataFrame.unique()

I am trying to get a new dataset, or change the value of the current dataset columns to their unique values. Here is an example of what I am trying to get : A B ----- 0| 1 1 1| 2 5 2| 1 5 3| 7 9 4| 7 9 5| 8 9 Wanted Result Not Wanted Result …
Mayeul sgc
  • 1,964
  • 3
  • 20
  • 35
111
votes
4 answers

How to concatenate two dataframes without duplicates?

I'd like to concatenate two dataframes A, B to a new one without duplicate rows (if rows in B already exist in A, don't add): Dataframe A: I II 0 1 2 1 3 1 Dataframe B: I II 0 5 6 1 3 1 New Dataframe: I …
MJP
  • 5,327
  • 6
  • 18
  • 18
105
votes
7 answers

Does Distinct() method keep original ordering of sequence intact?

I want to remove duplicates from list, without changing order of unique elements in the list. Jon Skeet & others have suggested to use the following: list = list.Distinct().ToList(); Reference: How to remove duplicates from a List? Remove…
Nitesh
  • 2,681
  • 4
  • 27
  • 45
103
votes
10 answers

Delete duplicate records in SQL Server?

Consider a column named EmployeeName table Employee. The goal is to delete repeated records, based on the EmployeeName field. EmployeeName ------------ Anand Anand Anil Dipak Anil Dipak Dipak Anil Using one query, I want to delete the records which…
usr021986
  • 3,421
  • 14
  • 53
  • 64
103
votes
4 answers

Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C

I have a pandas dataframe which contains duplicates values according to two columns (A and B): A B C 1 2 1 1 2 4 2 7 1 3 4 0 3 4 8 I want to remove duplicates keeping the row with max value in column C. This would lead to: A B C 1 2 4 2 7 1 3 4…
Elsalex
  • 1,177
  • 2
  • 8
  • 9
101
votes
5 answers

Filtering out duplicated/non-unique rows in data.table

Edit 2019: This question was asked prior to changes in data.table in November 2016, see the accepted answer below for both the current and previous methods. I have a data.table table with about 2.5 million rows. There are two columns. I want to…
Davy Kavanagh
  • 4,809
  • 9
  • 35
  • 50
100
votes
12 answers

Is there a no-duplicate List implementation out there?

I know about SortedSet, but in my case I need something that implements List, and not Set. So is there an implementation out there, in the API or elsewhere? It shouldn't be hard to implement myself, but I figured why not ask people here first?
Yuval
  • 7,987
  • 12
  • 40
  • 54
97
votes
7 answers

counting duplicates in a sorted sequence using command line tools

I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within this sorted list. I need to find the…
letronje
  • 9,002
  • 9
  • 45
  • 53
93
votes
16 answers

How to delete duplicate entries?

I have to add a unique constraint to an existing table. This is fine except that the table has millions of rows already, and many of the rows violate the unique constraint I need to add. What is the fastest approach to removing the offending rows?…
gjrwebber
  • 2,658
  • 2
  • 22
  • 26
93
votes
34 answers

Algorithm: efficient way to remove duplicate integers from an array

I got this problem from an interview with Microsoft. Given an array of random integers, write an algorithm in C that removes duplicated numbers and return the unique numbers in the original array. E.g Input: {4, 8, 4, 1, 1, 2, 9} Output:…
ejel
  • 4,135
  • 9
  • 32
  • 39
90
votes
11 answers

Determining duplicate values in an array

Suppose I have an array a = np.array([1, 2, 1, 3, 3, 3, 0]) How can I (efficiently, Pythonically) find which elements of a are duplicates (i.e., non-unique values)? In this case the result would be array([1, 3, 3]) or possibly array([1, 3]) if…
ecatmur
  • 152,476
  • 27
  • 293
  • 366