Highest Voted 'data-ingestion' Questions

3

votes

1 answer

What is intermediate persist in Apache Druid?

How does Druid persist real time ingested data before it hands off to Deep storage? In the document, Druid has configuration about intermedatepersistperiod, and maxpendingpersists. But it doesn't say much about what is intermediate persist, how it…

druid data-ingestion

asked Sep 13 '19 at 11:45

Happy

121
1
8

3

votes

0 answers

what's the difference between apache gobblin and spring-cloud-dataflow, how to choose?

As the official documentation Apache Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto…

bigdata etl data-ingestion gobblin

asked Jan 24 '18 at 07:32

user3172755

137
1
10

3

votes

1 answer

How disable base64 storing for ingest-attachment elasticsearch plugin?

The documentation shows example about how store base64 documents into elasticsearch via ingest-attachment plugin. But after this I got that elasticsearch index contains parsed text and base64 field source. Why does it needed? Is there a way to…

elasticsearch indexing full-text-search full-text-indexing data-ingestion

asked Sep 28 '17 at 09:22

Cherry

31,309
66
224
364

3

votes

1 answer

Suggested Hadoop-based Design / Component for Ingestion of Periodic REST API Calls

We are planning to use REST API calls to ingest data from an endpoint and store the data to HDFS. The REST calls are done in a periodic fashion (daily or maybe hourly). I've already done Twitter ingestion using Flume, but I don't think using Flume…

rest hadoop data-ingestion

asked Nov 11 '15 at 06:21

oikonomiyaki

7,691
15
62
101

2

votes

1 answer

Can users upload files into a S3 bucket without frontend experience or users having access to AWS account?

I am looking to create an AWS solution where a lambda function will transform some excel data from a S3 bucket. When thinking about how I'm going to create the architecture background, I need to think of a way where I can get non-technical users,…

amazon-web-services amazon-s3 aws-lambda data-ingestion

asked Nov 28 '22 at 15:53

abent

33
3

2

votes

1 answer

Query Last Inserted or Last updated rows from Snowflake Table

I would like to know how can I query for rows which were created or updated on a given date without using any specific column to look up in the database table. Is there a way information_schema can provide us with row level insert/update datetime?

snowflake-cloud-data-platform data-ingestion

asked Sep 28 '22 at 02:09

Nisarg Patel

21
1
5

2

votes

3 answers

How can I ingest data from Apache Avro into the Azure Data Explorer?

for several days I'm trying to ingest Apache Avro formatted data from a blob storage into the Azure Data Explorer. I'm able to reference the toplevel JSON-keys like $.Body (see red underlined example in the screenshot below), but when it goes to the…

azure azure-data-explorer kql data-ingestion

asked Jun 22 '22 at 06:55

allrik

43
8

2

votes

2 answers

Using Airbyte to get data from websites/datasets platforms like kaggle

I am new to Airbyte, our team is looking to use airbyte for different sources - ranging from http api (web scraped website) to websites containing datasets like kaggle etc. we are looking to create custom connectors for these sources. I am looking…

web-scraping etl orchestration data-transform data-ingestion

asked Apr 20 '22 at 11:04

adit modi

43
4

2

votes

1 answer

Snowflake ingestion: Snowpipe/Stream/Tasks or External Tables/Stream/Tasks

For ingesting data from an external storage location into Snowflake when de-duping is necessary, I came across two ways: Option 1: Create a Snowpipe for the storage location (Azure container or S3 bucket) which is automatically triggered by event…

merge snowflake-cloud-data-platform external-tables data-ingestion snowpipe

asked Mar 08 '22 at 21:55

Yasaman

53
4

2

votes

0 answers

How to create a incremental connector on Airbyte?

I am evaluating Airbyte to ingest data from multiple sources, one of them is a servicesnow API, I developed a connector using Airbyte CDK. I am trying to implement incremental streams or slides to improve data recovery performance. since pulling…

python request data-ingestion

asked Feb 02 '22 at 20:56

Danieledu

391
1
4
19

2

votes

1 answer

How to get an AWS Feature Store feature group into the ACTIVE state?

I am trying to ingest some rows into a Feature Store on AWS using: feature_group.ingest(data_frame=df, max_workers=8, wait=True) but I am getting the following error: Failed to ingest row 1: An error occurred (ValidationError) when calling the…

amazon-web-services aws-glue amazon-sagemaker data-ingestion aws-feature-store

asked Nov 14 '21 at 12:25

rudolfovic

3,163
2
14
38

2

votes

1 answer

Elasticsearch _id as MD5 hash or document fields

There are some examples available on the internet to customize _id field for a Elasticsearch document but is there a way to generate a composite _id of multiple fields. Sample Data { "first_name": "john", "last_name": "doe", "dob":…

elasticsearch pipeline data-ingestion

asked Nov 05 '21 at 08:26

Jugraj Singh

529
1
6
22

2

votes

1 answer

Azure Data Explorer High Ingestion Latency with Streaming

We are using stream ingestion from Event Hubs to Azure Data Explorer. The Documentation states the following: The streaming ingestion operation completes in under 10 seconds, and your data is immediately available for query after completion. I am…

azure latency azure-data-explorer data-ingestion

asked Jun 15 '21 at 08:09

Markus S.

2,602
13
44

2

votes

0 answers

How to retrieve the cash dividends from the quantopian-quandl data bundle with Zipline?

https://www.zipline.io/bundles.html By default zipline comes with the quantopian-quandl data bundle which uses quandl’s WIKI dataset. The quandl data bundle includes daily pricing data, splits, cash dividends, and asset metadata. Quantopian has…

quandl data-ingestion zipline quantopian

asked Nov 29 '20 at 04:08

Vincent Roye

2,751
7
33
53

2

votes

1 answer

Refresh Data in druid

I am using the index_parallel native batch method to ingest data to Druid from s3. I have done the initial ingestion using Tasks tab from druid UI. I want to schedule another task to do delta ingestion daily. I have gone through a lot of…

druid data-ingestion

asked Sep 03 '20 at 06:53

unknown

53
2
9

Questions tagged [data-ingestion]