Questions tagged [data-ingestion]
248 questions
1
vote
1 answer
Updating of Fact tables
I have a flatfile resources that were extracted into facts and dimensions.
Some dimensions also comes from db resources.
The transformation process is set on as needed basis (if there are new/updated from flatfiles).
The problem is this, some data…

zysirhc
- 37
- 7
1
vote
1 answer
azure data factory v2 ingest files from data lake with different filenames and structure
I've been tasked to ingest flat files from data lake storage.
They are multiple files and will be stored in the same logical folder.
The contents and structure of these files are different.
each times a new file is added with the same structure of a…

Geezer
- 513
- 5
- 17
1
vote
1 answer
Ingest processor foreach or script to replace all items in array
I am trying to run an ingest pipeline to replace instances of "on" and off" to true and false in an array.
This works perfectly with normal strings eg with data like this
[{onoffboolean: "on"}]
I am able to process this with the…

AndyJamesN
- 468
- 4
- 14
1
vote
0 answers
Managing Deltas in BigQuery
I am looking for guidance on how to manage incremental load into BigQuery. Here is our process
We receive csv files in GCS. As soon as it arrives in GCS we upload it to corresponding tables in the staging area with ingestion timestamp
From the…

Dinesh
- 309
- 3
- 14
1
vote
1 answer
Data ingestion issue with KQL update policy ; Query schema does not match table schema
I'm writing a function which takes in raw data table (contains multijson telemetry data) and reformat it to a multiple cols. I use .set MyTable <| myfunction|limit 0 to create my target table based off of the function and use update policy to alert…

user15186335
- 23
- 3
1
vote
1 answer
Error in Data Ingestion part (CSV File) using CsvExampleGen in TensorFlow
I'm reading the textbook "Building Machine Learning Pipelines: Automating Model Life Cycles with TensorFlow" textbook and one example shows you how to read CSV file and convert to tf.example data structure. However I'm really confused as to what…

cooldecola
- 118
- 7
1
vote
1 answer
How best to batch insert queries in Grakn?
What is best practice for batching Grakn insert queries?
from the docs:
"Keep the number of operations per transaction minimal. Although it is technically possible to commit a write transaction once after many operations, it is not recommended. To…

Jon T
- 108
- 6
1
vote
1 answer
MongoDB aggregation - operator to read in documents
Since Mongo only supports one $text field per aggregation pipeline (inside the first $match stage), that means you can't perform a logical AND, since you can't $and the results of multiple $text searches.
// Fails due to "too many text…

yev
- 65
- 1
- 6
1
vote
0 answers
What is the performance of the Weaviate Automatic Classification process?
I would like to investigate the possibility for enriching Splunk ingested data by using the Weaviate Automatic Classification in the streaming ingestion pipeline.
This can only work if the Automatic Classification process will only have a minor…

Aniel Parbhoe
- 11
- 1
1
vote
0 answers
Ingesting parquet files to landing zone
We are working with Azure cloud and we have some pipelines which ingest daily data from sap to azure data lake gen 2. We were working with Azure Data Factory ingesting json and csv files but maybe is better change our approach and ingest parquet…

criabd
- 11
- 2
1
vote
1 answer
Gobblin job metrics not publishing data to InfluxDB
I have configured .pull file to produce and send metrics to InfluxDb for source, extractor and converter jobs. I tried with the example wikipedia…

Rahul Kalita
- 21
- 4
1
vote
1 answer
Ingest data from multiple databases into a single solr collection
In order to ingest data from a single database I usually implement a process to load it through DataImportHandler. It is pretty easy to setup, appears to be very efficient in terms of time to load and it works really well for me. It is easy to load,…

user1778669
- 13
- 3
1
vote
1 answer
Azure stream analytics : How to ingest image to Azure hub in real time from my client system?
I want to send images from my system continuously to Azure cloud and process the image on the cloud using Azure stream analytics.
Following are my requirements:
Send images from a client(my desktop) continuously to Azure.
Run my ML algorithm on the…

msusmitha
- 11
- 1
1
vote
0 answers
Postgres (through sqlalchemy) not letting me replace table if it exists
File "/home/user/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 2712, in to_sql
method=method,
File "/home/user/.local/lib/python3.6/site-packages/pandas/io/sql.py", line 498, in to_sql
raise ValueError("'{0}' is not valid…

R. Gao
- 23
- 4
1
vote
1 answer
Apache Druid : Issue while updating the data in Datasource
I am currently using the druid-Incubating-0.16.0 version. As mentioned in https://druid.apache.org/docs/latest/tutorials/tutorial-update-data.html tutorial link, we can use combining firehose to update and merge the data for a data source.
Step: 1
I…

theNextBigThing
- 131
- 3
- 14