Questions tagged [deduplication]

De-duplication is the process of removing duplicated or redundant data from a database.

139 questions
0
votes
2 answers

Rails controller action duplication

I have a controller show action which does some stuff and renders a view but due to some custom routing, I need a totally different action in a totally different controller to perform the same stuff and render the same view. I don't really wish to…
rctneil
  • 7,016
  • 10
  • 40
  • 83
0
votes
0 answers

How to remove duplicates from xml loaded from the web in iOS programming

Say if I have the following profiles parsed from an xml file on the web and wanted to remove the duplicates, how would I do that? For example there are 2 profiles with the same first name (Amin) and I only want to display 1. I am displaying all the…
azi_santos
  • 105
  • 1
  • 12
0
votes
2 answers

De-Duplicating sets of n-grams

I need to come up with a way to sort and display the most relevant data to users. Our data consists of multiple n-grams that are extracted from Social Media. We call these 'topics'. The problem I am facing is that the data contains a lot of…
Kurtis
  • 1,599
  • 1
  • 16
  • 30
0
votes
1 answer

Dedup using HiveQL

I have a hive table with field 'a'(int), 'b'(string), 'c'(bigint), 'd'(bigint) and 'e'(string). I have data like: a b c d e --------------- 1 a 10 18 i 2 b 11 19 j 3 c 12 20 k 4 d 13 21 l 1 e 14 22 m 4 f 15 23 n 2 g …
chanchal1987
  • 2,320
  • 7
  • 31
  • 64
0
votes
2 answers

Opendedup does not decrease the storage space

I'm testing Opendedup, it seems to run correctly, but the real size of the files that I've put in the deduplicated partition is almost the same than the effectively size taken by this partition. In the configuration file, deduplication is activated…
Gaël Barbin
  • 3,769
  • 3
  • 25
  • 52
0
votes
0 answers

Deleting redundant information with regex

I would like to use regular expressions (.NET) for the following task. A text file contains the following lines: =650 \1$aPets$xFiction. =650 \1$aApartment houses$xFiction. =650 \0$aPets$xFiction. =650 \0$aApartment houses$xFiction. The…
whuffo15
  • 29
  • 1
  • 3
0
votes
2 answers

What's the best way to deduplicate with the info I have?

I need to find and remove duplicate files (.pst) and eventually get the unique emails. Currently, I am using Powershell to recursively go through folders to find only .pst files and then export specific metadata into a .csv file. It has been…
User_1403834
  • 411
  • 2
  • 7
  • 20
0
votes
1 answer

Deduplication of imported records in SQL server

I have the following T_SQL Stored Procedure that is currently taking up 50% of the total time needed to run all processes on newly imported records into our backend analysis suite. Unfortunately, this data needs to be imported every time and is…
ChrisBint
  • 12,773
  • 6
  • 40
  • 62
0
votes
2 answers

Importing CSV to database (duplicate entries)

My job requires that I look up information on a long spreadsheet that's updated and sent to me once or twice a week. Sometimes the newest spreadsheet leaves off information that was in the last spreadsheet causing me to have to look through several…
cream
  • 1,129
  • 5
  • 16
  • 26
0
votes
2 answers

VB.Net - Efficient way of de-duplicating data

I am dealing with a legacy application which is written in VB.Net 2.0 against a SQL 2000 database. There is a single table which has ~125,000 rows and 2 pairs of fields with similar data. i.e. FieldA1, FieldB1, FieldA2, FieldB2 I need to process a…
Shevek
  • 3,869
  • 5
  • 43
  • 63
-1
votes
2 answers

VBA code to remove random blank cells from a sheet

What would be the VBA code to remove blank cells randomly placed in a spreadsheet. Input ColA ColB ColC ColD ColE A B D H J I F B O Output Should be like: ColA ColB …
AriKari
  • 323
  • 1
  • 5
  • 17
-1
votes
1 answer

Deduplication in R Studio

this is my first R Code, and it is a very simple deduplication, but it is working so slowly I can't believe it! My question is: Is it normal that it is working so slowly or is my code just bad? Here it is: file1=c(read.delim("file.txt",…
sunwarr10r
  • 4,420
  • 8
  • 54
  • 109
-1
votes
2 answers

Deduplicate across strings maintaining identity label

JavaScript problem. Can this be done? I have an input array containing anything between 2 - 5 strings, each with a semi-colon delimited label to identify it. I need to de-duplicate such that the output removes the duplicates but also maintains the…
FTM
  • 3
  • 1
-1
votes
1 answer

Tracing Large Source Code

I am trying to modify some open source code, but I having trouble approaching how to do so The open source program that I am working with is called lessfs, and it has about four C files with up to 3000 lines of code. I am only concerned with one…
humblebeast
  • 303
  • 3
  • 16
-1
votes
2 answers

Rename duplicate record with PHP

I know you can select duplicate rows in MYSQL with the following query: SELECT ID, bedrijfsnaam, plaats, COUNT( * ) FROM profiles GROUP BY bedrijfsnaam, plaats HAVING COUNT( * ) >1 I want to able to select the duplicates and rename them with PHP.…
user2704687
  • 185
  • 3
  • 11
1 2 3
9
10