Questions tagged [data-lineage]

62 questions
0
votes
0 answers

Data Lineage query using sys.dm_sql_referenced_entities and sys.sys.dm_sql_referencing_entities in SQL Server

I'm in a Data Warehouse environment where tables are generated by stored procedures, the stored procedures reference other tables, and those referenced tables are created by other stored procedures, and so on etc. I'd like to, for a given procedure…
Oaty
  • 151
  • 1
  • 15
0
votes
1 answer

Python lineage naming with clustered dataframe

I have a dataframe sample1 0 0 0 0 0 1 1 1 1 1 1 1 1 L1 sample2 0 0 0 0 0 1 1 1 1 1 0 0 0 L1-1 sample3 0 0 0 0 0 1 1 0 0 0 0 0 0 L1-1-1 sample4 0 0 0 0 0 1 0 0 0 0 0 0 0 L1-1-1-1 sample5 0 0 0 0 0 0 0 1 1 0 0 0 0 L1-1-2 sample6 0 0 0 0 0 0 0 1…
SG Kwon
  • 163
  • 1
  • 9
0
votes
1 answer

What is the best way to represent data lineage in an image processing pipeline?

I am trying to determine the best way to represent data lineage for image processing. I have a images stored in S3 and I want to process them and then place them back in S3. I would then want to be able to run a query so I can see all the images and…
0
votes
1 answer

Data Lineage in purview insufficient

Azure Purview at moment shows the data lineage from ADF for only Copy activities. Is this sufficient? In this article it is given: "By pushing metadata from Azure Data Factory into Azure Purview a reliable and transparent lineage tracking is…
Blue Clouds
  • 7,295
  • 4
  • 71
  • 112
0
votes
1 answer

How is data lineage tracked in aws athena and glue?

Atlas is product of choice for Hadoop data lineage question. Is there any clear product for data lineage tracking on aws Athena or Glue.
0
votes
1 answer

Does Purview shows lineage for Auto created tables through dataflows by ADF pipelines?

I've debugged my ADF pipeline, The pipeline contains 4 copy activities and two DataFlows. After the Debug is done , I switched to Azure Purview to look at the changes done to the Datafactory and I was able to see the Pipeline. But when I go into the…
0
votes
1 answer

Determining relations hit by a query

I have a PostgreSQL query constructed by a templating mechanism. What I want to do is to determine the relations actually hit by the query when it is run and record them in a relation. So this a very rudimentary lineage problem. Simply looking at…
sonat
  • 198
  • 1
  • 5
0
votes
0 answers

Kind of groupByKey on RDD[(K, V)] returning List[(K, RDD[V])]

I would split an RDD[(K, V)] into buckets such as the output type would be a List[(K, RDD[V])], here is my proposal. But i'm not satisfy because it rely on keysNumber run over the original RDD. Does it exist other way to process requiring less run…
KyBe
  • 842
  • 1
  • 14
  • 33
0
votes
1 answer

How do you differentiate between QVD source files and target files when reading a QVW's XML MetaData?

I am currently trying to find an alternative to the Governance Dashboard that Rob Wunderlich (Qlik founder) created, since I am currently encountering errors when using it. How do you differentiate between a data source (QVD, aka source) that is…
C Murphy
  • 47
  • 1
  • 13
0
votes
1 answer

Lineage feature in Cloudera Navigator

Does Lineage work in the Enterprise trial version of Cloudera? I see the lineage tab but i dont get to see the lineage of the hive table which i derived from another hive table. Unfortunately, this information is also not very clear from the…
Manikandan Kannan
  • 8,684
  • 15
  • 44
  • 65
0
votes
1 answer

java.lang.StackOverflowError throw in spark-submit but not in running in IDE

I have developed a Spark 2.2 application for collaborative filtering. It works fine in IntelliJ for running or debugging. I can enter Spark Web UI to check the process too. But when I tried to deploy it to EMR and test spark-submit locally the…
tom10271
  • 4,222
  • 5
  • 33
  • 62
0
votes
1 answer

Apache NiFi instance hangs on the "Computing FlowFile lineage..." window

My Apache NiFi instance just hangs on the "Computing FlowFile lineage..." for a specific flow. Others work, but it won't show the lineage for this specific flow for any data files. The only error message in the log is related to an error in one of…
The Shoe Shiner
  • 697
  • 4
  • 20
0
votes
1 answer

Is there a best practice guide and annotations for a data lineage diagram

I am looking to create a data lineage diagram showing the source and movement of some of our data across different systems and processes and found that there is not one data lineage diagram that looks the same. I just wanted to know if there is…
user3165854
  • 1,505
  • 8
  • 48
  • 100
0
votes
1 answer

Getting Data Lineage Out from Spark Logs

I am exploring option to get data lineage information out from Spark Logs for Spark programs. I am looking for information like which kafka topics or Tables Spark program reads or writes into, so that we can get that information run time and to…
0
votes
1 answer

Task lineage between Dependant Dags in Airflow

We have many DAGs scheduled to run daily using Airflow. Dependencies has been enabled using ExternalTaskSensor, TriggerDagRunOperator and custom operators Sample: Task 1 in DAG A are dependent on task 2 in DAG B Task 3 in DAG A are dependent on task…