Questions tagged [data-quality]

Data quality could refer to conditions of the data and techniques to evaluate or improve such conditions

124 questions
0
votes
1 answer

how to resolve this error:Read: Data overflow/conversion error

how to resolve this error:Read: Data overflow/conversion error for [some field] .I am getting this error after running the mapping in informatica data quality 9.1.0
sonaa
  • 1
  • 2
  • 10
0
votes
1 answer

Where is tMatchGroup located in Talend Open Studio for Big Data

I'm learning Data Quality with Talend Open Studio for Big Data version TOS_DQ-20141207_1530-V5.6.1.zip. According to my problem, I want to use tMatchGroup but I realize that it doesn't appear in the Palette of Talend Studio. In the help.talend.com,…
minh-hieu.pham
  • 1,029
  • 2
  • 12
  • 21
0
votes
1 answer

Calculating and reporting Data Completeness

I have been working with measuring the data completeness and creating actionable reports for out HRIS system for some time. Until now i have used Excel, but now that the requirements for reporting has stabilized and the need for quicker response…
Nichlas H.
  • 135
  • 2
  • 6
0
votes
1 answer

tFuzzyMatch apparently not working on Arabic text strings

I have created a job in talend open studio for data integration v5.5.1. I am trying to find matches between two customer names columns, one is a lookup and the other contain dirty data. The job runs as expected when the customer names are in…
0
votes
2 answers

data preprocessing in R for removing duplicate in a string

I am doing data preprocessing and am stuck at a problem.I have data like Telma 2525 mg tablet. I want it to be converted to Telma 25 mg tablet.Can this be done? Thanks
user3171906
  • 543
  • 2
  • 9
  • 17
0
votes
2 answers

Data quality with Ruby

I'm looking for any libraries that can help to match two words with misspelling. For instance, the gem should mark the following statements as true (it's just an example, not necessary to have standard strings extended) 'Start' ==…
Misha Slyusarev
  • 1,353
  • 2
  • 18
  • 45
0
votes
1 answer

Best practices for managing workarounds (for broken data)

I have to work with government-provided data that is sometimes broken in strange ways. My code already contains snippets like: for row in governmental_data: # XXX Workaround for that one row among thousands # that was mislabeled by a clerk…
Krastanov
  • 6,479
  • 3
  • 29
  • 42
0
votes
1 answer

Reconciliation of low quality data: view vs (table + scheduled procedures)

TL;DR: How to build a consistent, proper view from bad, moving data that is reconciliated according to moving rules? Hello all :) I'm building a database where data has to be transformed, reconciliated when possible and enriched heavily. (btw, If…
BenoitParis
  • 3,166
  • 4
  • 29
  • 56
-1
votes
1 answer

Theoretically, are DATE and TIME two different variables?

I’m curious to find out if, in terms of tidy data principles, a column containing “date and time” ( 1/1/21 11:31) would be considered as a single variable or tow separate ones?
Jose
  • 1
-1
votes
2 answers

How to identify different data types within a single column?

Let us say that we have a column with the following values: Apple, Mango, Orange, 123, 987, Guava, 01/01/2020 python recognizes this column as an "object" data type automatically. I have been given a task to count the number of data types in a…
-1
votes
1 answer

IDQ input parameter file error in windows

In IDQ process, I have generated a workflow parameter in server location. When I am trying to call with the batch script the parameter file with different source file names facing an error which shows the parameter file is not found. It was not…
-1
votes
1 answer

Have an requirement to verify the input (from eventhub) of stream analytics jobs for data quality

Is there a way to detect data quality issues from the streaming inputs (eventhub) (JSON) in Azure stream analytics? Scenarios: 1) Bad messages: Blank records, NULLS/Spaces in key columns 2) values above expected Range, incorrect data type etc. 3)…
-1
votes
1 answer

from free text to list of value

I'm implementing a web application with a supscription form (using Java for the backend). In this form there is a field with a dropdown list associated to it. The user has the possibility (with an auto-completion functionality) to select a value…
-1
votes
5 answers

Split Full Name with Format: {Last, First Middle} Comprehensive Cases

My client sent me name data as a Name string which includes the last, first, and middle names in a single entry. I need them split into LastName, FirstName, and MiddleName. I have found some scripts online, but they don't serve my purposes because…
Myles Baker
  • 3,600
  • 2
  • 19
  • 25
-2
votes
0 answers

How to determine text patterns in list of text values to validate new inputs?

How do I detect text patterns in a list of text values so that I can test against that pattern to validate a new value? For example, Given a list of text values like this: SKU-1242 SKU-5450 SKU-6532 SKU-2395 SKU-2393 SKU-9310 234321 I would like to…
1 2 3
8
9