De-duplication is the process of removing duplicated or redundant data from a database.
Questions tagged [deduplication]
139 questions
0
votes
2 answers
Rails controller action duplication
I have a controller show action which does some stuff and renders a view but due to some custom routing, I need a totally different action in a totally different controller to perform the same stuff and render the same view.
I don't really wish to…

rctneil
- 7,016
- 10
- 40
- 83
0
votes
0 answers
How to remove duplicates from xml loaded from the web in iOS programming
Say if I have the following profiles parsed from an xml file on the web and wanted to remove the duplicates, how would I do that? For example there are 2 profiles with the same first name (Amin) and I only want to display 1. I am displaying all the…

azi_santos
- 105
- 1
- 12
0
votes
2 answers
De-Duplicating sets of n-grams
I need to come up with a way to sort and display the most relevant data to users. Our data consists of multiple n-grams that are extracted from Social Media. We call these 'topics'.
The problem I am facing is that the data contains a lot of…

Kurtis
- 1,599
- 1
- 16
- 30
0
votes
1 answer
Dedup using HiveQL
I have a hive table with field 'a'(int), 'b'(string), 'c'(bigint), 'd'(bigint) and 'e'(string).
I have data like:
a b c d e
---------------
1 a 10 18 i
2 b 11 19 j
3 c 12 20 k
4 d 13 21 l
1 e 14 22 m
4 f 15 23 n
2 g …

chanchal1987
- 2,320
- 7
- 31
- 64
0
votes
2 answers
Opendedup does not decrease the storage space
I'm testing Opendedup, it seems to run correctly, but the real size of the files that I've put in the deduplicated partition is almost the same than the effectively size taken by this partition.
In the configuration file, deduplication is activated…

Gaël Barbin
- 3,769
- 3
- 25
- 52
0
votes
0 answers
Deleting redundant information with regex
I would like to use regular expressions (.NET) for the following task.
A text file contains the following lines:
=650 \1$aPets$xFiction.
=650 \1$aApartment houses$xFiction.
=650 \0$aPets$xFiction.
=650 \0$aApartment houses$xFiction.
The…

whuffo15
- 29
- 1
- 3
0
votes
2 answers
What's the best way to deduplicate with the info I have?
I need to find and remove duplicate files (.pst) and eventually get the unique emails. Currently, I am using Powershell to recursively go through folders to find only .pst files and then export specific metadata into a .csv file. It has been…

User_1403834
- 411
- 2
- 7
- 20
0
votes
1 answer
Deduplication of imported records in SQL server
I have the following T_SQL Stored Procedure that is currently taking up 50% of the total time needed to run all processes on newly imported records into our backend analysis suite. Unfortunately, this data needs to be imported every time and is…

ChrisBint
- 12,773
- 6
- 40
- 62
0
votes
2 answers
Importing CSV to database (duplicate entries)
My job requires that I look up information on a long spreadsheet that's updated and sent to me once or twice a week. Sometimes the newest spreadsheet leaves off information that was in the last spreadsheet causing me to have to look through several…

cream
- 1,129
- 5
- 16
- 26
0
votes
2 answers
VB.Net - Efficient way of de-duplicating data
I am dealing with a legacy application which is written in VB.Net 2.0 against a SQL 2000 database.
There is a single table which has ~125,000 rows and 2 pairs of fields with similar data.
i.e. FieldA1, FieldB1, FieldA2, FieldB2
I need to process a…

Shevek
- 3,869
- 5
- 43
- 63
-1
votes
2 answers
VBA code to remove random blank cells from a sheet
What would be the VBA code to remove blank cells randomly placed in a spreadsheet.
Input
ColA ColB ColC ColD ColE
A B D
H J I
F B O
Output Should be like:
ColA ColB …

AriKari
- 323
- 1
- 5
- 17
-1
votes
1 answer
Deduplication in R Studio
this is my first R Code, and it is a very simple deduplication, but it is working so slowly I can't believe it! My question is: Is it normal that it is working so slowly or is my code just bad?
Here it is:
file1=c(read.delim("file.txt",…

sunwarr10r
- 4,420
- 8
- 54
- 109
-1
votes
2 answers
Deduplicate across strings maintaining identity label
JavaScript problem. Can this be done?
I have an input array containing anything between 2 - 5 strings, each with a semi-colon delimited label to identify it. I need to de-duplicate such that the output removes the duplicates but also maintains the…

FTM
- 3
- 1
-1
votes
1 answer
Tracing Large Source Code
I am trying to modify some open source code, but I having trouble approaching how to do so
The open source program that I am working with is called lessfs, and it has about four C files with up to 3000 lines of code. I am only concerned with one…

humblebeast
- 303
- 3
- 16
-1
votes
2 answers
Rename duplicate record with PHP
I know you can select duplicate rows in MYSQL with the following query:
SELECT ID, bedrijfsnaam, plaats, COUNT( * )
FROM profiles
GROUP BY bedrijfsnaam, plaats
HAVING COUNT( * ) >1
I want to able to select the duplicates and rename them with PHP.…

user2704687
- 185
- 3
- 11