Questions tagged [data-lineage]
62 questions
0
votes
1 answer
How can you create lineage between Power BI datasets and Databricks sql warehouse
My organization wants to have a complete overview of the lineage from Power BI reports to data warehouse (lakehouse architecture). The goal for now is to create a PoC using Purview.
My question is, how can I link Power BI assets to the specific…

Bart
- 43
- 1
- 11
0
votes
1 answer
How to generate DBT data lineage graphs in client's production environment?
Our project runs on client's infrastructure wherein infra is managed via Kubernetes and Terraform. We automate our jobs using Airflow.
Any Airflow with DBT runs using KubernetesPodOperator provided in Airflow. We plan to create data lineage graphs…

azaveri7
- 793
- 3
- 18
- 48
0
votes
0 answers
How to get metadata from Talend Data Management Platform?
I'm building a Data Lineage and for that I need the metadata about the Talend jobs, schemas and tables associated with each job.
I went through the documentation but there was only one way to get that data and that was using GUI.
I want to automate…

Manan Jain
- 1
- 1
- 2
0
votes
1 answer
Purview - capturing data lineage of data import process
We have Microsoft Purview configured in our Azure Cloud network. When doing Purview scanning of data assets in our Azure subscription, Purview successfully classifiesdata assets (based on the scan rule sets). But we noticed that when we import data…

nam
- 21,967
- 37
- 158
- 332
0
votes
0 answers
Any pointers how to build a data lineage solution across multiple application?
We have a requirement to capture data-lineage across multiple applications. These applications span multi-tech stack ranging from PL/SQL to Java to Spark.
Any hints how to proceed will be of great help.
Thanks
Anuj Mehra

Anuj Mehra
- 320
- 3
- 19
0
votes
0 answers
Duplication of process for different inputs and outputs
Hi I am new to apache atlas and I am facing the following issue. I know it's possible to link entities through a process, but I'm having difficulty in cases where different sets are created by the same process. That is, conditionally, I have a set…
0
votes
1 answer
Iterate over columns and rows to identify what changed for data analysis
I have a historical table that keeps track of the status of a task over time.
The table looks similar to the below, where the 'ID' is unique to the task, 'Date' changes whenever an action is taken on the task, 'Factor1, Factor2, etc' are columns…

JJH
- 9
- 2
0
votes
0 answers
ZetaSQL - catalog registration for nested complex types (Array of Record/Struct)
I am using ZetaSQL to analyze statements (Analyzer.analyzeStatements) coming from GCP audit log, particularly, the queries executed from BigQuery.
Usually, for simple queries, the way I register it into a catalog is by creating a SimpleTable with…

Jay-r Bangit
- 59
- 1
- 9
0
votes
0 answers
Integration of Alation, Manta, and BigID over cloud
I need to integrate and deploy Alation, Manta, and BigID over the cloud[AWS/Azure] for data governance. can anyone please suggest source material for the below:
How to deploy these 3 over the cloud
How these 2 work integrated, I could not find…

Abhi Soni
- 445
- 6
- 14
0
votes
0 answers
Column level lineage without access to code
I am trying to determine column-level lineage between a target table and a number of source tables. The columns that end up in the target table come from one or more of the source tables and may have been transformed by one or more intermediate…

nuges01
- 43
- 3
0
votes
0 answers
How to get the name of a vaiable in a Groovy script?
In a groovy script, I'd like to log important assignments in the code and I'd like to do so, with a simple trait that I can have my classes implement to have to be able to attach the nuanced logging I need to figure out what's happening in the code.…

Sina K. Heshmati
- 1
- 2
0
votes
0 answers
could not translate host name to address (Data lineage- tokern)
version: '3.6'
services:
tokern-demo-catalog:
image: tokern/demo-catalog:latest
container_name: tokern-demo-catalog
restart: unless-stopped
networks:
- tokern-internal
volumes:
-…

develop
- 55
- 10
0
votes
0 answers
EMR pyspark driver hangs after processing a large dataset
I'm running a large job (with a long process) on a medium sized data (~100GB input data). Below is the AWS EMR settings:
EMR version: emr-6.6.0
PySpark version: 3.2.0
EMR cluster:
master - c5.4xlarge (16 vCore, 32 GiB memory, EBS only storage EBS…

Wenzhong Zhao
- 1
- 1
0
votes
1 answer
ZetaSQL - Parsing Capabilities and Functionalities
I am currently working on a lineage system that will be deployed in our google-cloud space, the goal is to extract and parse SQL queries executed from BigQuery using audit logs and create a lineage out of those. I explored a couple of existing…

Jay-r Bangit
- 59
- 1
- 9
0
votes
1 answer
How to get insert fields from sql?
I am using Flink Sql to parse sql's lineage.
I use flink planner to parse a sql as
insert into target_table(dest_f1, dest_f2) select source_f1, source_f2 from source_table
Obviously, source_f1 is the source of dest_f1.
When I get a…

slo
- 23
- 5