Highest Voted 'data-ingestion' Questions

2

votes

2 answers

Druid with Kafka Ingestion: filtering data

is it possible to filter data by dimension value during ingestion from Kafka to Druid? e.g. Considering dimension: version, which might have values: v1, v2, v3 I would like to have only v2 loaded. I realize it can be done using Spark/Flink/Kafka…

apache-kafka druid data-ingestion

asked Mar 20 '17 at 12:38

pcejrowski

603
5
15

2

votes

2 answers

Not able to load files larger than 100 MB into HDFS

I'm facing a really strange issue with my cluster. Whenever I'm trying to load any file into HDFS that is larger than 100 MB(104857600 bytes) it fails with the following error: All datanodes are bad... Aborting. This is really strange as 100 MB…

hadoop mapreduce hdfs data-ingestion

asked Sep 29 '16 at 07:51

Megh Vidani

635
1
7
22

2

votes

3 answers

If we using 6 mapper in sqoop to importing the data from Oracle, then how many connection will be establish between sqoop and source

If we using 6 mapper in sqoop to importing the data from Oracle, then how many connection will be establish between sqoop and source. Will it be a single connection or it will be 6 connections for each mapper.

oracle hadoop mapreduce sqoop data-ingestion

asked Jul 13 '16 at 19:59

smisra3

107
1
12

2

votes

4 answers

Sqoop import multiple tables but not all

All the searches I've found show how to import one table or recommend the import-all-tables. What if I want 35 of 440 tables from my db. Can I just write one command and separate the tables by comma or do I have to put it in a script and copy and…

hadoop sqoop data-ingestion

asked Apr 07 '16 at 19:17

AM_Hawk

661
1
15
33

1

vote

0 answers

API doesn't support batch/bulk operations

I have a CSV file with 1.5 millions records, I need to call API to get the users email_address, unfortunately the API documents shows it doesn't support batch operation. Currently , for 1.5 millions records it will run about 3-4 hours, Is there…

python amazon-data-pipeline data-ingestion

asked Aug 29 '23 at 06:07

Olivia Xu

11
1

1

vote

1 answer

MongoDB to Databricks Data Ingestion

I am working on creating a pipeline from MongoDB to Databricks. Based on my research there are two ways of doing it: MongoDB Change Streams MongoDB-Databricks Connecor for Structured Streaming. I am using Pyspark. I am doing this to get all the…

mongodb pyspark databricks pipeline data-ingestion

asked Aug 08 '23 at 07:22

Mayank Jain

11
2

1

vote

1 answer

Octavia apply Airbyte gives

I'm trying to create a new BigQuery destination on Airbyte with Octavia cli. When launching: octavia apply I receive: Error: {"message":"The provided configuration does not fulfill the specification. Errors: json schema validation failed when…

google-bigquery data-ingestion airbyte

asked Feb 13 '23 at 11:06

tdebroc

1,436
13
28

1

vote

0 answers

TimescaleDB: how to ingest files from s3?

In Postgres, a way to ingest files from s3 directly is through the aws_s3 extension, using table_import_from_s3 function for example. However this is not directly supported by TimescaleDB as of now. => CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;…

amazon-s3 timescaledb data-ingestion

asked Feb 01 '23 at 15:11

xmar

1,729
20
48

1

vote

0 answers

embed additional second dataframe into plot

I want my plot to retrieve data from one dataframe, but hovering over the data i want it to incorperate data from both data frames. example: which results from fig = px.scatter(X_reduced_df, x='EXTRACTION_DATE_SAMPLE', y='score_IF', color=…

python pandas plotly mousehover data-ingestion

asked Dec 13 '22 at 14:13

Danny

41
5

1

vote

1 answer

Delta live tables data quality checks -Retain failed records

There are 3 types of quality checks in Delta live tables: expect (retain invalid records) expect_or_drop (drop invalid records) expect_or_fail (fail on invalid records) I want to retain invalid records, but I also want to keep track of them. So,…

databricks azure-databricks data-ingestion delta-live-tables

asked Nov 13 '22 at 11:34

Ender

71
9

1

vote

0 answers

Create data syncs using two tables

I want to create a data syncs in Palantir using un update (update + insert) transaction on three fields from two diffent tables, there is anoption in Palantir syncs to use twin table but i can't see how to add three fields in the incremental field…

apache-spark etl palantir-foundry data-ingestion incremental-build

asked Oct 19 '22 at 07:55

f.ivy

65
5

1

vote

1 answer

How can I do an incremental load based on record ID in Dagster

I am trying to consume an HTTP API in my Dagster code. The API provides a log of "changes" which contain an incrementing ID. It supports an optional parameter fromUpdateId, which lets you only fetch updates that have a higher ID than some…

data-ingestion dagster

asked Oct 07 '22 at 07:34

Imre Kerr

2,388
14
34

1

vote

1 answer

Extracting data from Multiple Excel files with multiple tabs and multiple columns using Python

I'm trying to create a data ingestion routine to load data from multiple excel files with multiple tabs and columns in a data structure using python. The structuring of the tabs in each of the excel files is the same. Can someone please help me with…

python data-structures jupyter-notebook data-extraction data-ingestion

asked Sep 01 '22 at 20:33

Harsh780

13
5

1

vote

2 answers

The document creation or update failed because of invalid reference

I am having trouble completing an excersice on the Microsoft Learn platform. https://learn.microsoft.com/en-us/learn/modules/examine-components-of-modern-data-warehouse/5-exercise-azure-synapse I have followed the instructions, but get the following…

azure-synapse data-ingestion

asked May 28 '22 at 15:09

BareAnders

27
4

1

vote

1 answer

Snowflake - Best practices to keep tables up to date with s3 external stage

We want to ingest our source tables from an s3 external stage into Snowflake. For this ingestion we have to consider, new files arriving in the s3 bucket, updates in existing files, and in some cases row deletions. We are evaluating 3 approaches so…

snowflake-cloud-data-platform external-tables data-ingestion

asked May 18 '22 at 10:57

Ioannis Agathangelos

11
1

Questions tagged [data-ingestion]