Questions tagged [duplicates]

The "duplicates" tag concerns detecting and/or dealing with multiple instances of items in collections.

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about preventing, detecting, removing, or otherwise dealing with unwanted duplicates, or adapting to safely allow duplicates.

15777 questions
90
votes
14 answers

IntegrityError duplicate key value violates unique constraint - django/postgres

I'm following up in regards to a question that I asked earlier in which I sought to seek a conversion from a goofy/poorly written mysql query to postgresql. I believe I succeeded with that. Anyways, I'm using data that was manually moved from a…
89
votes
11 answers

MySQL delete duplicate records but keep latest

I have unique id and email fields. Emails get duplicated. I only want to keep one Email address of all the duplicates but with the latest id (the last inserted record). How can I achieve this?
Khuram
  • 1,156
  • 2
  • 14
  • 16
89
votes
17 answers

php: check if an array has duplicates

I'm sure this is an extremely obvious question, and that there's a function that does exactly this, but I can't seem to find it. In PHP, I'd like to know if my array has duplicates in it, as efficiently as possible. I don't want to remove them like…
Mala
  • 14,178
  • 25
  • 88
  • 119
88
votes
4 answers

Left Join without duplicate rows from left table

Please look at the following query: tbl_Contents Content_Id Content_Title Content_Text 10002 New case Study New case Study 10003 New case Study New case Study 10004 New case Study New case Study 10005 New case Study New case…
urooj.org
  • 883
  • 1
  • 7
  • 5
87
votes
4 answers

Check for duplicate values in Pandas dataframe column

Is there a way in pandas to check if a dataframe column has duplicate values, without actually dropping rows? I have a function that will remove duplicate rows, however, I only want it to run if there are actually duplicates in a specific…
Jeff Mitchell
  • 1,067
  • 1
  • 8
  • 16
84
votes
3 answers

Find and delete duplicate rows with PostgreSQL

We have a table of photos with the following columns: id, merchant_id, url this table contains duplicate values for the combination merchant_id, url. so it's possible that one row appears more several times. 234 some_merchant …
schlubbi
  • 1,623
  • 2
  • 13
  • 17
83
votes
10 answers

Finding duplicate files and removing them

I am writing a Python program to find and remove duplicate files from a folder. I have multiple copies of mp3 files, and some other files. I am using the sh1 algorithm. How can I find these duplicate files and remove them?
sanorita
  • 933
  • 1
  • 8
  • 6
83
votes
2 answers

Too much data duplication in mongodb?

I'm new to this whole NOSQL stuff and have recently been intrigued with mongoDB. I'm creating a new website from scratch and decided to go with MONGODB/NORM (for C#) as my only database. I've been reading up a lot about how to properly design your…
mike
  • 831
  • 1
  • 7
  • 3
81
votes
8 answers

Remove duplicate records based on multiple columns?

I'm using Heroku to host my Ruby on Rails application and for one reason or another, I may have some duplicate rows. Is there a way to delete duplicate records based on 2 or more criteria but keep just 1 record of that duplicate collection? In my…
sergserg
  • 21,716
  • 41
  • 129
  • 182
81
votes
4 answers

TypeError: unhashable type: 'list' when using built-in set function

I have a list containing multiple lists as its elements eg: [[1,2,3,4],[4,5,6,7]] If I use the built in set function to remove duplicates from this list, I get the error TypeError: unhashable type: 'list' The code I'm using is TopP =…
ami91
  • 1,284
  • 1
  • 11
  • 20
81
votes
2 answers

Find indices of duplicated rows

Function duplicated in R performs duplicate row search. If we want to remove the duplicates, we need just to write df[!duplicated(df),] and duplicates will be removed from data frame. But how to find the indices of duplicated data? If duplicated…
annndrey
  • 1,768
  • 2
  • 20
  • 24
80
votes
8 answers

How to select records without duplicate on just one field in SQL?

I have a table with 3 columns like this: +------------+---------------+-------+ | Country_id | country_title | State | +------------+---------------+-------+ There are many records in this table. Some of them have state and some other…
Mohammad Saberi
  • 12,864
  • 27
  • 75
  • 127
79
votes
6 answers

How can I mark/highlight duplicate lines in VI editor?

How would you go about marking all of the lines in a buffer that are exact duplicates of other lines? By marking them, I mean highlighting them or adding a character or something. I want to retain the order of the lines in the…
Brian Carper
  • 71,150
  • 28
  • 166
  • 168
78
votes
7 answers

How to repeat a Pandas DataFrame?

This is my DataFrame that should be repeated for 5 times: >>> x = pd.DataFrame({'a':1,'b':2}, index = range(1)) >>> x a b 0 1 2 I want to have the result like this: >>> x.append(x).append(x).append(x) a b 0 1 2 0 1 2 0 1 2 0 1 …
lsheng
  • 3,539
  • 10
  • 33
  • 43
74
votes
19 answers

How to find a duplicate element in an array of shuffled consecutive integers?

I recently came across a question somewhere: Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array,…
SysAdmin
  • 5,455
  • 8
  • 33
  • 34