Questions tagged [data-quality]

Data quality could refer to conditions of the data and techniques to evaluate or improve such conditions

124 questions
0
votes
2 answers

What software is availible for data quality checking

I'm looking to identify some possible software options that will allow for custom rules to manipulate bulk data files (.csv) For example, proper capitalization (allowing for states to remain capital and unique surnames), identifying the word count…
Phil
  • 86
  • 5
0
votes
1 answer

Finding clusers or coherent values in a dataset with Python/pandas

i am prettys new to Python and trying to do an event analysis. I have two datasets: One with events and one with stockdata. Now i need to construct equally weighted portfolios and 'refresh' the portfolio construction every month. Therfor I need…
0
votes
3 answers

Detecting unit differences in data (SAS)

I have two sets of financial data that tend to contain differences due to unit errors e.g. $10000 in one dataset may be $1000 in the other. I'm trying to code a check for such differences, but the only way I can think of is to divide the two…
Vinnie
  • 1
  • 2
0
votes
1 answer

Generalized Data Quality Checks on Datasets

I am pulling in a handful of different datasets daily, performing a few simple data quality checks, and then shooting off emails if a dataset fails the checks. My checks are as plain as checking for duplicates in the dataset, as well as checking if…
sanjayr
  • 1,679
  • 2
  • 20
  • 41
0
votes
2 answers

Remove extra spaces from Arabic field

How do I remove trailing, leading and multiple spaces between the Arabic words. The spaces in Arabic fields are not like the space which we have in English language. In Arabic spaces will be some elongated characters different from the blank space…
0
votes
1 answer

How to check for data quality in SSIS?

When converting data during the transfer,I move all rejected (i.e. failed) conversions into a reject table. However, I only get an entry for the FIRST error: Example source data: Name | Salary | Zipcode ------------------------ Paul | 12000 |…
Andre Doose
  • 161
  • 10
0
votes
2 answers

How to add a data quality check utility in awe glue.?

How to add a job that just checks for data quality like null, correct data type etc in aws glue
user5626966
  • 21
  • 1
  • 1
  • 2
0
votes
0 answers

What are some of the most efficient workflows for processing "big data" (250+ GB) from postgreSQL databases?

I am constructing a script that will be processing well-over 250+ GB of data from a single postgreSQL table. The table's shape is ~ 150 cols x 74M rows (150x74M). My goal is to somehow sift through all the data and make sure that each cell entry…
Tom Hood
  • 497
  • 7
  • 16
0
votes
2 answers

I want to detect latin characters with umlaut mark anywhere in given string by using informatica

I want to detect latin characters with umlaut mark anywhere in given string by using informatica. Requirement is whenever I found atleast one Latin character with umlaut mark anywhere in a string, I'll give output as Fail else pass.
0
votes
0 answers

How to use labeler transformation in informatica developer?

I am new to this IDQ tool. can anyone give me a step by step instructions with some screenshots to know how to use the labeler transformation using reference tables. Any video links related to labeler transforamtion using reference tables are also…
voli
  • 25
  • 1
  • 11
0
votes
1 answer

IDQ input parameter file error

In IDQ infacmd I am trying to execute multiple wf with source input parameter file, The first infacmd gets success but the second infacmd mapping fails because of input parameter is taking the default value not the assigned value.
harish v
  • 13
  • 1
  • 9
0
votes
2 answers

How to identify duplicate records using client name and address in SQL while both of them is in free text

I have a database with millions of client contacts. However, a lot of them are duplicated and may I ask some hero from here to advise how to identify those duplicates using Oracle SQL, PL/SQL or Excel. Following is the data…
E. L.
  • 502
  • 3
  • 16
0
votes
1 answer

Informatica Developer(IDQ) stats

How can we capture the mapping stats like mapping name ,Source rows,Target rows,Start time,End time in informatica developer(IDQ) tool into a table
0
votes
1 answer

Datacheck : Compare string values (input) to existing language (Dutch dictionary) in R

I am trying to filter out crappy open answers (string variables) like 'ffff' en 'fdaljfdlksajf' using an R script. I hoped that there was some kind of dictionary package available in R with which I could do this, but I can't seem to find it. Another…
SHW
  • 461
  • 7
  • 26
0
votes
1 answer

Arranging objects in PowerCenter Designer

I have two questions. 1) I am attempting to arrange my objects in the Source Analyzer view into some kind of organized format on display so that i can sort through it. However, the Source analyzer option to "Arrange All" is "greyed" out when i go…
1 2 3
8 9