Questions tagged [amazon-deequ]

Github page

57 questions
0
votes
1 answer

Is it possible to run Deequ anomaly detection on multiple partitions separately in parallel

We have Spark dataframes partitioned on multiple columns. For example, we have a partner column that can be Google, Facebook, and Bing. And we have a channel column that can be PLA and Text. We would like to run anomaly detection on Google-PLA,…
0
votes
2 answers

Unable to run amazon deequ examples locally

I am trying to run and test amazon deequ library locally but am repeatedly getting the class not found error for various examples. exact error java.lang.NoClassDefFoundError: scala/Product$class at…
ReyAhamed
  • 43
  • 10
0
votes
1 answer

Using reflections to access methods in Amazon Deequ

I plan on creating a user config file that I will later parse in order to run some checks from Amazon Deequ. I want to be able to pass the string names from the config file to get the methods; however, in my attempts to do so, I keep hitting…
dustin
  • 4,309
  • 12
  • 57
  • 79
0
votes
1 answer

Failed to load : com/amazon/deequ/checks/Check

I'm building a spark application to load two json files, compare them, and print the differences. I also try to validate these files using amazon library aws deequ , but I'm getting the below exception: WARNING: Use --illegal-access=warn to enable…
Arar
  • 1,926
  • 5
  • 29
  • 47
0
votes
2 answers

Azure DataBricks - Deequ - Finding Rows that failed on a check

I followed https://aws.amazon.com/blogs/big-data/test-data-quality-at-scale-with-deequ/ and got running with the checks and verification etc. But I am not able to find out , on which rows exactly my data is failing. That is a very important part ,…
0
votes
2 answers

Adding a Check based on a Compliance analyzer

Here is the sample data frame (df) I'm working with: +---+----+--------+ | id|orig|scrubbed| +---+----+--------+ | 1| a| a| | 2| B| b| | 3| c| c| | 4| D| d| | 5| *| XX| | 6| $| XX| | 7| ZZ| …
0
votes
1 answer

How can i save Deequ Contraint Suggestions to a file for use again?

Hi I am using Amazon Deequ to generate a set of constraints for data quality checks on my data. I want to save the constraint suggestion object to HDS so I can load it and use it to verify any time I want to run a data quality check. How can I save…
martinl
  • 1
  • 2
0
votes
0 answers

Print the metrics directly without mentioning the column names explicitly Scala

I have the following code in scala: import com.amazon.deequ.analyzers.runners.{AnalysisRunner, AnalyzerContext} import com.amazon.deequ.analyzers.runners.AnalyzerContext.successMetricsAsDataFrame import com.amazon.deequ.analyzers.{Compliance,…
0
votes
1 answer

How to call Amazon Deequ hasDataType from java

I am trying to implement Amazon Deequ functionality from Java. I am trying to add datatype constains but not able to pass 3rd parameter (assertion) from java com.amazon.deequ.constraints.Constraint constrains = Constraint.dataTypeConstraint("test",…
0
votes
1 answer

Parse DQ rules from excel in AWS Deequ

Does anyone have an example on how to parse Data Quality rules from an excel sheet in AWS DeeQu?
0
votes
1 answer

How to filter rows with column constraint in Deequ ColumnProfileRunner?

I am new to Scala and Spark. I am exploring the Amazon Deequ library for data profiling. How do I get count of rows having a particular value while using ColumnProfilerRunner()? The AnalysisRunner has an option of "compliance" I am looking for a…
Ravi
  • 117
  • 1
  • 2
  • 10
0
votes
1 answer

Scala Spark: how to add list of generated methods to a function

I am using Amazon deequ to generate test cases which returns following list of methods that I want to use in further function instead of coding it individually. var rows = suggestionDataFrame.select("_3").collect().map(_.getString(0)).mkString("…
1 2 3
4