Questions tagged [data-engineering]
69 questions
0
votes
0 answers
Manage account in Azure Databricks
hi every one,
when I try to go to manage an account in Azure Databricks, and select my workspace return to onboarding and can't access to manage account.
try more and no result found from more than 3 days and create another workspace and still…
0
votes
1 answer
Spark continuous structured streaming not showing input rate or process rate metrics
I'm running my spark continuous structured streaming application on a standalone cluster. However I noticed that metrics like average input/sec or avg process/sec is not showing(as NaN) on the structured streaming UI. I have…

XIAOAGE
- 37
- 4
0
votes
0 answers
Integrating Airbyte as a multi-container app in the main docker-compose.yml
I am building a data pipeline with Airbyte, PostgreSQL and dbt. PostgreSQL and DBT I can easily set up via my main docker-compose.yml but with Airbyte I am not sure. Airbyte itself is a multi-container app so it has it's own docker-compose.yml.
To…

user22316802
- 1
- 2
0
votes
1 answer
How to loop almost 1 million rows from the bigquery with python
I am a newbie, I just started a query where I have ~1 million rows on bigquery and it has 25 columns. Rows have the type is RowIterator
I wrote a script in Python to loop them and process data. I used:
client = bigquery.Client()
query_job =…

D9SeveN
- 1
- 2
0
votes
0 answers
Why do I get a KeyError in this Mage Data Pipeline?
I am attempting to enrich a dataset with zip codes from the Chicago Data Portal. The chicago crimes dataset can be found at https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2 and the geographic data for zip codes can be…

Cole
- 95
- 1
- 2
- 5
0
votes
2 answers
Using Python to Remove a row in Excel based on Cell Value
I'm attempting to clean an excel file prior to sending it up to the database for calculations.
By default when the Excel Report is exported out of our system (NextGen) it attaches a row that calculates a Sum Total of data throughout the report based…

jarodmwk
- 1
- 4
0
votes
0 answers
OOM error while reading .parquet file. How do I solve this?
I am working on a ETL project. For that I am trying to read a .parquet file in order to see, transform the data and upload it.
I´ve been failing with that as I always get an "OOM error" while reading it.
Is there some way I could read this…

mdein
- 1
0
votes
1 answer
How to Calculate GPA's in MATLAB
Ive just began learning MATLAB
I am in engineering school and we were given a problem to solve in matlab
the problem is as follows (also attached):
The text file called Transcript.txt lists the courses, grades and credits for a student transcript…

Abdullah Laher
- 11
- 3
0
votes
1 answer
ADF Data flow expression
I am trying to build ADF data flow select operation to dynamically select column names.
I am receiving required column names in an array parameter named 'colNames' and then I am trying to use that in data flow expression to check if column name in…

KBR
- 464
- 1
- 7
- 24
0
votes
0 answers
Constraint constraints/compute.requireOsLogin violated for project (project id)
While creating the Data Quality task in dataplex i am facing the issue as Constraint constraints/compute.requireOsLogin violated for project.
I have check all the task configuration but i am not able to find anything related this error.
0
votes
1 answer
Is it possible to build seed dataset/table over multiple files in DBT?
Is it possible to build seed dataset/table over multiple files in DBT?
I have two data files like below in my dbt project
Building seed dataset/table on individual file works perfectly fine.
However, what I am looking for is to create one seed…

Pravin Singh
- 79
- 7
0
votes
2 answers
handle dynamic number of columns (csv file) in pyspark
I am getting the CSV file below (without the header) -
D,neel,32,1,pin1,state1,male
D,sani,31,2,pin1,state1,pin2,state2,female
D,raja,33,3,pin1,state1,pin2,state2,pin3,state3,male
I want to create the CSV file below using pyspark dataframe…
0
votes
1 answer
open source data stack - Airbyte, Airflow, ?,?
I am building a open source data stack for a large-scale batch pipeline. The data is later to be used in a ML model that is updated quarterly.
I want to use Airbyte for ingestion and Airflow for generel orchestration.
In general, I want to use…

user22316802
- 1
- 2
0
votes
0 answers
Importing data after DACPAC
I am in the process of transferring a database from one Azure environment to another Azure environment. The old database must remain intact and I would like to do the deployment via a release pipeline in devops.
I created a new database project with…

Pie
- 1
0
votes
1 answer
Kafka stream in DATABRICKS increases a lot of data
When I perform a Kafka write stream to a table in Databricks, the incoming data doesn't increase the table size significantly, but it results in a much larger increase in the data size on Blob storage.
val kafkaBrokers=""
val…

Berkay Babataş
- 1
- 2