Questions tagged [duplicates]

The "duplicates" tag concerns detecting and/or dealing with multiple instances of items in collections.

A duplicate is any re-occurrence of an item in a collection. This can be as simple as two identical strings in a list of strings, or multiple complex objects which are treated as the same object when compared to each other.

This tag may pertain to questions about preventing, detecting, removing, or otherwise dealing with unwanted duplicates, or adapting to safely allow duplicates.

15777 questions
3
votes
1 answer

Why does accessing columns of a pandas dataframe with .loc[] produce duplicate rows?

Why is .loc[] producing duplicate rows in my DataFrame? I'm trying to select a few columns from m3, a DataFrame with 47 columns,to create a new DataFrame called output. The problem: after accessing m3's columns with .loc[], output has way more…
David
  • 606
  • 9
  • 19
3
votes
3 answers

removing duplicates - ** only when the duplicates occur in sequence

I would like to do something similar to the following, except I would only like to remove 'g' and'g' because they are the duplicates that occur one after each other. I would also like to keep the sequence the same. Any help would be appreciated!!! I…
jess
  • 33
  • 3
3
votes
3 answers

Fastest algorithm to detect duplicate files

In the process of finding duplicates in my 2 terabytes of HDD stored images I was astonished about the long run times of the tools fslint and fslint-gui. So I analyzed the internals of the core tool findup which is implemented as very well written…
Marcel
  • 137
  • 1
  • 8
3
votes
3 answers

Removal of adjacent duplicates by row - [R]

I have a data frame where each row represents interaction data per person. actions = read.table('C:/Users/Desktop/actions.csv', header = F, sep = ',', na.strings = '', stringsAsFactors = F) Each person can have one, or more of the following…
Sandy
  • 1,100
  • 10
  • 18
3
votes
4 answers

Find duplicate lines based on column and print both lines and their numbers with awk

I have a following file: userID PWD_HASH test 1234 admin 1234 user 6789 abcd 5555 efgh 6666 root 1234 Using AWK, I need to find both original lines and their duplicates with row numbers, so that get the output like: NR $0 1 test 1234 2 admin 1234 6…
skazichris
  • 95
  • 2
  • 12
3
votes
5 answers

SQL - Only select row that is not duplicated

I need to transfer data from one table to another. The second table got a primary key constraint (and the first one have no constraint). They have the same structure. What I want is to select all rows from table A and insert it in table B without…
Melursus
  • 10,328
  • 19
  • 69
  • 103
3
votes
1 answer

Choose smallest value from lists given same coordinates in 2d list

I have two lists: a = [[9, 5], [9, 10000], [9, 10000], [5, 10000], [5, 10000], [10001, 10], [10001, 10]] b = [19144.85, 8824.73, 26243.88, 23348.02, 40767.17, 55613.43, 40188.8] I am trying to remove the repeated coordinates in a and remove the…
mb567
  • 691
  • 6
  • 21
3
votes
2 answers

How to identify duplicate records with a plugin in Dynamics CRM 2011

I am looking to design some logic inside of my create plugin for the entity 'account'. What it does is basically check account Names and identifies account names which are duplicates on creation. So if there is an account Name, Barclays for example,…
Calibre2010
  • 3,729
  • 9
  • 25
  • 35
3
votes
8 answers

How to find duplicate values in a list and merge them

So basically for example of you have a list like: l = ['a','b','a','b','c','c'] The output should be: [['a','a'],['b','b'],['c','c']] So basically put together the values that are duplicated into a list, I tried: l =…
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
3
votes
2 answers

Python: Remove pair of duplicated strings in random order

I have a list as below [('generators', 'generator'), ('game', 'games'), ('generator', 'generators'), ('games', 'game'), ('challenge', 'challenges'), ('challenges', 'challenge')] Pairs ('game', 'games') and ('games', 'game') are kind of same but…
Raja G
  • 5,973
  • 14
  • 49
  • 82
3
votes
0 answers

How do I add this formatted string to a python object & ensure there aren't any duplicates

I have a series of strings defined in this way: print('{x}@{y}:{z}'.format(**config['name1'])) print('{x}@{y}:{z}'.format(**config['name2'])) and so on. I need to add them to a python object & ensure I check for duplicates so all the…
nlp
  • 109
  • 1
  • 4
  • 19
3
votes
1 answer

Duplicate/Copy same file N times

I have an R file that is stored in a directory on my computer. I would like to create 10 duplicates of this R file in an automatized way. The 10 duplicates of this R file should be stored in the same directory and each of them should have a…
Joachim Schork
  • 2,025
  • 3
  • 25
  • 48
3
votes
2 answers

Lisp program that duplicates an element of a list using a list of integers

I'm working on a program that takes a list of elements and each individual element is duplicated based on an integer contained in a 2nd list of integers. for example if I had a list of (A B C D) being duplicated by: (1 5 4 2) I would have (A B B…
joshbusiness
  • 128
  • 7
3
votes
1 answer

Python Pandas Remove Duplicate Cells - Keep the rows

I am trying to remove duplicates values of specific columns based on a single column, while keeping the rest of the row. df = pd.DataFrame({'A':[1,2,3,4],'B':[5,5,6,7],'C':['a','a','b',c'], D:['c','d','e','f']}) I want to delete the values in…
KMartik
  • 35
  • 1
  • 6
3
votes
2 answers

Python Pandas. Delete cells whose value is contained in another cell in the same column

I have a dataframe like this: A B exa 3 example 6 exam 4 hello 4 hell 3 I want to delete the rows that are substrings of another row and keep the longest one (Notice that B is already the length of A) I…
1 2 3
99
100