Questions tagged [data-ingestion]
248 questions
1
vote
0 answers
Databricks delta live tables stuck when ingest file from S3
I'm new to databricks and just created a delta live tables to ingest 60 millions json file from S3. However the input rate (the number of files that it read from S3) is stuck at around 8 records/s, which is very low IMO. I have increased the number…

Thanh Nguyen
- 11
- 1
1
vote
1 answer
How to write an ingest pipeline for elastic search to load a csv file as nested JSONs?
I have a csv file that has the following format:
company_id
year
sales
buys
location
3
2020
230
112
europe
3
2019
234
231
europe
2
2020
443
351
usa
2
2019
224
256
usa
and when I import it to elastic search I end up having one entry…

StefSco
- 23
- 3
1
vote
3 answers
Best way to ingest data to bigquery
I have heterogeneous sources like flat files residing on prem, json on share point, api which serves data so and so. Which is the best etl tool to bring data to bigquery environment ?
Im a kinder garden student in GCP :)
Thanks in advance

vignesh
- 1,414
- 5
- 19
- 38
1
vote
1 answer
write only when all tables are valid with databricks and delta table
I'm looping through some CSV files in a folder. I want to write these CSV files as delta table only if they are all valid. Each CSV files in a folder as different name and schemas. I want to reject the entire folder and all the files it contains…

Simon Breton
- 2,638
- 7
- 50
- 105
1
vote
1 answer
Azure Data Explorer Stream Ingest formatted JSON Documents
We ingest JSON messages from Event Hub into Azure Data Explorer via Stream Ingestion.
I created a table with this statement
.create table messages(SerialNumber: string, ReceivedUtcTime: datetime, IngestEventEnqueuedUtcTime: datetime,…

Markus S.
- 2,602
- 13
- 44
1
vote
1 answer
Data Ingestion in azure data lake
I Have a requirement where I need to ingest continuous/steam data(Json format) from eventHub to Azure data lake.
I want to follow the layered approach(raw, clean, prepared) to finally store data into delta table.
My doubt is around the raw…

Deepak
- 31
- 3
1
vote
0 answers
MarkLogic Splitting Large XML Files Into Multiple Documents
If we have such input file:
$ cat > example.xml
George
Washington
Betsy
…

Den_Alex
- 51
- 2
1
vote
1 answer
Does WHEN clause in an insert all query when loading into multiple tables in Snowflake add a virtual field over each row and then load in bulk?
How WHEN clause evaluate values of columns in order to insert only new values and skip existing ones when using the following query:
INSERT ALL
WHEN (SELECT COUNT(*) FROM DEST WHERE DEST.ID = NEW_ID) = 0 THEN
INSERT INTO DEST (ID) VALUES…

alim1990
- 4,656
- 12
- 67
- 130
1
vote
1 answer
How can we use parallel loading in data warehouse ingest scripts to load into multiple tables at the same time without duplications?
is it possible to load data into multiple tables using INSERT ALL without adding duplications or without using overwrite to accomplish it?
As WHEN clause doesn't support subqueries unless it returns a value to compare with something else, i am…

alim1990
- 4,656
- 12
- 67
- 130
1
vote
1 answer
Azure Cosmos DB CSV upload
I am opening a CSV file in Python in Pycharm, then I want to upload it to my Container in Cosmos DB. It's not working.
if os.path.exists(csv_file):
with codecs.open(csv_file, 'rb', encoding="utf-8") as csv:
csv_reader = DictReader(csv)
…

Maria Evans
- 25
- 3
1
vote
1 answer
Loading plain text dates in Spark v3 from CSV
I am trying to ingest a very basic CSV file with dates in Apache Spark. The complexity resides in the months being spelled out. For analytics purposes, I'd like to have those months as a date. Here is my CSV file:
Period,Total
"January…

jgp
- 2,069
- 1
- 21
- 40
1
vote
1 answer
Skip Header row when loading data from csv using Ingest Utility in db2
I am trying to load data into a db2 target table from a csv file using the ingest utility.
I see the header row getting rejected with an error message.
Is there any option (similar to skipcount in import utility) to skip the header row so to avoid…

vineeth
- 641
- 4
- 11
- 25
1
vote
1 answer
How to insert/ingest Current timestamp into kusto table
I am trying to insert current datetime into table which has Datetime as datatype using the following query:
.ingest inline into table NoARR_Rollout_Status_Dummie <| @'datetime(2021-06-11)',Sam,Chay,Yes
Table was created using the following…

A D
- 51
- 2
- 12
1
vote
1 answer
Single data ingestion service vs multiple individual microservices?
I am trying to understand the pros and cons when having a single data ingestion microservice versus multiple individual microservices for each source of data.
The context:
There are multiple sources of data that I need to get retrieve customer data…

Mahir Hiro
- 135
- 1
- 7
1
vote
1 answer
How to parse data in a variety of data formats/structures?
I'm terribly unfamiliar with the data engineering space, but here goes:
I have users that upload data in a variety of formats, that I want to convert to a single standard format. For example:
Source Format #1
{
"firstName": "Bob",
"lastName":…

diplosaurus
- 2,538
- 5
- 25
- 53