Highest Voted 'data-engineering' Questions

0

votes

0 answers

Manage account in Azure Databricks

hi every one, when I try to go to manage an account in Azure Databricks, and select my workspace return to onboarding and can't access to manage account. try more and no result found from more than 3 days and create another workspace and still…

asked Aug 23 '23 at 10:22

Mahmoud Abdo

1

0

votes

1 answer

Spark continuous structured streaming not showing input rate or process rate metrics

I'm running my spark continuous structured streaming application on a standalone cluster. However I noticed that metrics like average input/sec or avg process/sec is not showing(as NaN) on the structured streaming UI. I have…

apache-spark spark-streaming spark-structured-streaming data-engineering

asked Aug 22 '23 at 15:44

XIAOAGE

37
4

0

votes

0 answers

Integrating Airbyte as a multi-container app in the main docker-compose.yml

I am building a data pipeline with Airbyte, PostgreSQL and dbt. PostgreSQL and DBT I can easily set up via my main docker-compose.yml but with Airbyte I am not sure. Airbyte itself is a multi-container app so it has it's own docker-compose.yml. To…

docker-compose data-engineering airbyte

asked Aug 21 '23 at 16:03

user22316802

1
2

0

votes

1 answer

How to loop almost 1 million rows from the bigquery with python

I am a newbie, I just started a query where I have ~1 million rows on bigquery and it has 25 columns. Rows have the type is RowIterator I wrote a script in Python to loop them and process data. I used: client = bigquery.Client() query_job =…

python loops google-bigquery row data-engineering

asked Aug 21 '23 at 03:36

D9SeveN

1
2

0

votes

0 answers

Why do I get a KeyError in this Mage Data Pipeline?

I am attempting to enrich a dataset with zip codes from the Chicago Data Portal. The chicago crimes dataset can be found at https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2 and the geographic data for zip codes can be…

python geolocation pipeline geopandas data-engineering

asked Aug 20 '23 at 18:03

Cole

95
1
2
5

0

votes

2 answers

Using Python to Remove a row in Excel based on Cell Value

I'm attempting to clean an excel file prior to sending it up to the database for calculations. By default when the Excel Report is exported out of our system (NextGen) it attaches a row that calculates a Sum Total of data throughout the report based…

python python-3.x excel data-cleaning data-engineering

asked Aug 18 '23 at 22:00

jarodmwk

1
4

0

votes

0 answers

OOM error while reading .parquet file. How do I solve this?

I am working on a ETL project. For that I am trying to read a .parquet file in order to see, transform the data and upload it. I´ve been failing with that as I always get an "OOM error" while reading it. Is there some way I could read this…

out-of-memory etl parquet data-engineering oom

asked Aug 17 '23 at 19:24

mdein

1

0

votes

1 answer

How to Calculate GPA's in MATLAB

Ive just began learning MATLAB I am in engineering school and we were given a problem to solve in matlab the problem is as follows (also attached): The text file called Transcript.txt lists the courses, grades and credits for a student transcript…

matlab data-engineering

asked Aug 16 '23 at 04:21

Abdullah Laher

11
3

0

votes

1 answer

ADF Data flow expression

I am trying to build ADF data flow select operation to dynamically select column names. I am receiving required column names in an array parameter named 'colNames' and then I am trying to use that in data flow expression to check if column name in…

azure-data-factory data-transform apache-synapse data-engineering

asked Aug 13 '23 at 02:10

KBR

464
1
7
24

0

votes

0 answers

Constraint constraints/compute.requireOsLogin violated for project (project id)

While creating the Data Quality task in dataplex i am facing the issue as Constraint constraints/compute.requireOsLogin violated for project. I have check all the task configuration but i am not able to find anything related this error.

google-cloud-platform devops data-quality data-engineering google-dataplex

asked Aug 11 '23 at 10:18

Sheikh Mubashir

1

0

votes

1 answer

Is it possible to build seed dataset/table over multiple files in DBT?

Is it possible to build seed dataset/table over multiple files in DBT? I have two data files like below in my dbt project Building seed dataset/table on individual file works perfectly fine. However, what I am looking for is to create one seed…

dbt data-engineering dbtype dbt-cloud

asked Aug 09 '23 at 13:33

Pravin Singh

79
7

0

votes

2 answers

handle dynamic number of columns (csv file) in pyspark

I am getting the CSV file below (without the header) - D,neel,32,1,pin1,state1,male D,sani,31,2,pin1,state1,pin2,state2,female D,raja,33,3,pin1,state1,pin2,state2,pin3,state3,male I want to create the CSV file below using pyspark dataframe…

python dataframe pyspark apache-spark-sql data-engineering

asked Aug 08 '23 at 19:53

neel banerjee

1

0

votes

1 answer

open source data stack - Airbyte, Airflow, ?,?

I am building a open source data stack for a large-scale batch pipeline. The data is later to be used in a ML model that is updated quarterly. I want to use Airbyte for ingestion and Airflow for generel orchestration. In general, I want to use…

pipeline batch-processing open-source data-engineering airbyte

asked Aug 06 '23 at 06:30

user22316802

1
2

0

votes

0 answers

Importing data after DACPAC

I am in the process of transferring a database from one Azure environment to another Azure environment. The old database must remain intact and I would like to do the deployment via a release pipeline in devops. I created a new database project with…

sql-server database azure azure-devops data-engineering

asked Aug 03 '23 at 09:58

Pie

1

0

votes

1 answer

Kafka stream in DATABRICKS increases a lot of data

When I perform a Kafka write stream to a table in Databricks, the incoming data doesn't increase the table size significantly, but it results in a much larger increase in the data size on Blob storage. val kafkaBrokers="" val…

scala databricks delta-lake data-engineering

asked Aug 01 '23 at 13:38

Berkay Babataş

1
2

Questions tagged [data-engineering]