Questions tagged [amazon-deequ]

Github page

57 questions
1
vote
0 answers

Is it possible to load constrains from file (csv, txt) to Deequ Checks?

Is it possible to save suggested constrains to file and then load them as cheks? I was able to do it without saving them with next code val allConstraints = suggestionResult.constraintSuggestions.flatMap { case (_, suggestions) => …
1
vote
1 answer

Deequ satisfies function not behaving as expected

I am using pydeequ to run some checks on data, however it is not behaving as expected. One of my columns should contain any values between 0 and 1. The data looks like this |col 1 | | 0.5635412 | | 0.123 | | 1.0 | check =…
lr53
  • 67
  • 8
1
vote
1 answer

Amazon Deequ (Spark + Scala ) - java.lang.NoSuchMethodError: 'scala.Option org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAgg

Spark Version - 3.0.1 Amazon Deequ version - deequ-2.0.0-spark-3.1.jar Im running the below code in spark shell in my local : import com.amazon.deequ.analyzers.runners.{AnalysisRunner, AnalyzerContext} import…
1
vote
1 answer

How to use hasUniqueness check in PyDeequ?

I'm using PyDeequ for data quality and I want to check the uniqueness of a set of columns. There is a Check method hasUniqueness but I can't figure how to use it. I'm trying: check.hasUniqueness([col1, col2], ????) But what should we use here for…
ruy
  • 23
  • 3
1
vote
1 answer

What do the result dataframe's columns of a Deequ check signify?

So, I ran a simple Deequ check in Spark, that went something like this : val verificationResult: VerificationResult = { VerificationSuite() .onData(dataset) .addCheck( Check(CheckLevel.Error, "Review Check") .isComplete("col1") …
1
vote
1 answer

Using Deequ on AWS Glue

I am using Deequ on AWS GLUE, surprisingly when I was to run the hasMaxLength which is listed under Checks for the verificationSuite. I get the following error, can someone help? All other checks are passing/running. It says the check hasMaxLength…
user3476582
  • 75
  • 1
  • 10
1
vote
1 answer

Pyspark version of Amazon Deequ

I am working on AWS Glue and leveraging pyspark API for my ETL. I believe if I need to use Amazon Deequ I need to switch to Scala. However I still want contine to use Pyspark APIs. Is there a way out? If yes what are the steps I need to follow in…
1
vote
1 answer

Histogram in Anomaly detection Deequ library

Can we use histogram analyzer in anomaly detection? Let's say, I want to check for the change in the ratio of variables in a specified column. For example histogram analysis for a column with Male and Female as values is something like (Male - 0.6)…
1
vote
1 answer

Adding new suggestion rule in deequ

I would like to add several new rules in suggestions deequ workflow. For example deequ is offering check if column contains URL (containsURL). I would like to make corresponding suggestion rule. I would appreciate suggestions how to do…
dejan
  • 196
  • 2
  • 11
1
vote
1 answer

Requesting an advice on big data validation

I'm a newbie on big data validation and processing. Having little understanding about datacompy, which I have used to compare two datasets (pandas). However I couldn't find any source that can do data validations, i.e. column validations on emails,…
user157023
  • 11
  • 2
1
vote
1 answer

building a function to add checks to amazon deequ framework

Using amazon deequ library I'm trying to build a function that takes 3 parameters, the check object, a string telling what constraint needs to be run and another string that provides the constraint criteria. I have a bunch of checks that I want to…
Riyan Mohammed
  • 247
  • 2
  • 6
  • 20
1
vote
2 answers

Compute Metrics by using Deequ with Scala

I am new to Scala and Amazon Deequ. I have been asked to write a Scala code that would compute metrics (e.g. Completeness, CountDistinct etc) on constraints by using Deequ on source csv files stored on S3, and load the generated metrics in a Glue…
marie20
  • 723
  • 11
  • 30
0
votes
0 answers

Error using PyDeequ Profile in Databricks

I am new to Python, Databricks, and pydeequ. I'm trying to use pydeequ in Databricks. I installed the library via Maven using "com.amazon.deequ:deequ:2.0.4-spark-3.3". The analyzers are working, but not the profilerunner. I am trying to run this…
0
votes
0 answers

Amazon deequ does not run in container but works locally

I am unable to execute deequ functionalities when I try to run the job on k8s. However, it works correctly in local. I am using 2.0.0-spark-3.1 as dependency. As a trivial test, I tried to run the following val df =…
0
votes
0 answers

Unable to pass variable to Deequ Checks

I am trying to implement Deequ Check: date_start distinct values should match number of days between 2018-01-01 and $runDate Here is what I do: Calculate date diff val min_dt = LocalDate.of(2018, 1, 1) // Adjusting max_dt to account for the Airflow…