-1

Is there a way to detect data quality issues from the streaming inputs (eventhub) (JSON) in Azure stream analytics?

Scenarios: 1) Bad messages: Blank records, NULLS/Spaces in key columns 2) values above expected Range, incorrect data type etc. 3) Non- Standard messages

Have checked Anomaly detection but it does not provide these features.

NOTE: I am running a job for =Data Quality in parallel to data processing to capture messages with data quality issues into BLOB to investigate/reprocess.

To avoid the performance issues in processing pipeline.

Has anyone implemented a data Quality framework in Azure ?

Thanks, Mohan

  • You will have to code that out and I don't think ASA is the best tool to do that. I'd say do it in the processing pipeline. Anomaly Detection is something entirely different: it detects anomalies in the data stream itself, like fast raising temperatures or spikes in outages for example. – Peter Bons Nov 19 '17 at 17:30
  • Thanks Peter. Doing those checks in processing pipeline will it not impact performance? Also will java script udf of stream analytics provided by azure be a right candidate for doing that ? – Mohan S Nov 19 '17 at 18:37
  • We also do not have access to perform those logics in the source systems directly. – Mohan S Nov 19 '17 at 18:47
  • I don't think it will have any more impact doing it in the processing pipeline than it does using a js udf in stream analytics. How many message per second are we talking about anyway? – Peter Bons Nov 20 '17 at 10:47

1 Answers1

0

Using Azure Stream Analytics, you can add different filters to check if the messages are compliant to your business logic:

  • Add filters in the WHERE clause for detecting: blank records, NULLS/Spaces in key columns, and values above expected ranges
  • Use TRY_CAST feature to detect incorrect data types

However, Azure Stream Analytics will rely on well formatted messages, so won't be able to read messages that are not valid JSON. So it will probably not fulfil all your requirements.

Jean-Sébastien
  • 737
  • 3
  • 7