Data quality could refer to conditions of the data and techniques to evaluate or improve such conditions
Questions tagged [data-quality]
124 questions
2
votes
2 answers
Data quality framework definition concern
Can someone help me define a data quality framework to analyze some sort of data ? Just a high level description of what it is supposed to do? Just your thoughts on it.

user3738248
- 81
- 3
2
votes
1 answer
Are there .NET development tools out there that can help serve as a data issue reporter and tracker?
I need to build a system that generates reports on data exceptions (e.g. this value is stale because it hasn't been updated in x days). Once they have a daily report on data quality issues my users would like to have a bunch of filtering capability…

MrMustard
- 21
- 1
2
votes
3 answers
How to use pure SQL for Exploratory Data Analysis?
I'm an ETL developer using different tools for ETL tasks. The same question rises in all our projects: the importance of the data profiling before the Data Warehouse is build and before the ETL is build for data movement. Usually I have done data…

jrara
- 16,239
- 33
- 89
- 120
2
votes
2 answers
Regular Expression Equivalent for other data types used for Data Validation
I am creating a data quality framework for a database that looks at single cells of each data type and sees whether or not their values are acceptable.
For data type string:
I just use a regular expression to define what is valid
For other data…
user1483511
2
votes
1 answer
Matching values NOT in domain values of DQS SQL Server 2012
I am using SQL Server 2012 Data Quality Services. I would like to consider any value that does not exist under domain values as Invalid. For example, if I have the values of 'abc', 'def' listed as correct in my domain values tab under domain…

M_devera
- 95
- 1
- 9
1
vote
2 answers
Informatica Data Quality - Match Analysis
In our Duplicate analysis requirement the input data has 1418 records out of which 1380 records are duplicate records.
On using the Match Analysis (used Key Generator, Matcher, Associator, Consolidator) in IDQ integrated with PowerCenter except for…

Muthukumar
- 8,679
- 17
- 61
- 86
1
vote
2 answers
algorithm for data quality in a data warehouse
I'm looking for a good algorithm / method to check the data quality in a data warehouse.
Therefore I want to have some algorithm that "knows" the possible structure of the values and then checks if the values are a member of this structure and then…

Tyzak
- 2,430
- 8
- 38
- 52
1
vote
0 answers
AWS DataQuality Rules should fail but passed for null value
I have a csv file with 8 columns. within the columns i purposely deleted some cells.
When i tried to run a Glue DataQuality job, for IsComplete, the result passed (which is not supposed to) for one column , but the rest of the columns failed as…

khorjle
- 11
- 1
1
vote
0 answers
How to Use Python to Identify and Report Bad Data Instances?
This is a general question on if anyone is aware of a library like sklearn which has a function to read data and report back any strange behaviors or quality concerns within the data after getting user input specifying the type of data such…

Ben C Wang
- 617
- 10
- 19
1
vote
1 answer
How to process multiple csv files for identifying null values in R?
I have various .csv files. Each file has multiple columns. I am using the given code in R to pursue a quality check that for a particular column, how many rows have valid values and how many are null. The code works well for a single csv file. But I…

Michael_Brun
- 51
- 6
1
vote
0 answers
How to use DatabaseConnector connect with Hive in R in CDSW
I am trying to connect to Hive using the DatabaseConnector but unable to do so in R within CDSW. Can anyone please suggest how to accomplish this?
Please note that when using the driver and url, I am able to connect with hive and query the same…

Fierymech
- 11
- 3
1
vote
1 answer
Using great expectations with databricks autolaoder
I have implemented a data pipeline using autoloader bronze --> silver --> gold.
now while I do this I want to perform some data quality checks, and for that I'm using great expectations library.
However I'm stuck with below error when trying to…

Chhaya Vishwakarma
- 1,407
- 9
- 44
- 72
1
vote
1 answer
How to select all rows that have the same value in each column using tidyverse?
I'm working on data quality analysis for a questionnaire where respondents were asked to check mark every bit of food that they ate. Some respondents left the form blank so I'm trying to figure out a way to select or count all rows where each column…

Matt
- 25
- 4
1
vote
0 answers
Encryption in BigQuery
We have a local historical data source that we want to decommission and move it to BigQuery for storage as well as for analyzing. There are some sensitive fields that we don’t want to be exposed but still want to keep them in BigQuery. We’ve read…

r61238t
- 13
- 3
1
vote
0 answers
How to perform the Rosner Test for outliers at the bottom of the distribution?
I am performing a Rosner Outlier Test and want to show the outliers at the bottom of the distribution. I can't see where I could use an argument like opposite = TRUE for the Rosner Outlier Test.
Code example for the Rosner Test:
test <-…

Mirko
- 19
- 2