Highest Voted 'data-ingestion' Questions

0

votes

1 answer

Updating hive table with sqoop from mysql table

I have already a hive table called roles. I need to update this table with info coming up from mysql. So, I have used this script think that it will add and update new data on my hive table:` sqoop import --connect…

asked Aug 14 '17 at 19:55

Andres Urrego Angel

1,842
7
29
55

0

votes

1 answer

sqoop export update table record in RDBMS MySQL

So I'm trying to perform an update in an RDBMS table in MySQL. The thing is that this update is coming from a file in my HDFS and although in MySQL the table count with a primary key when I update the records the result sets came up with duplicated…

mysql hadoop sqoop data-ingestion

asked Aug 08 '17 at 01:17

Andres Urrego Angel

1,842
7
29
55

0

votes

1 answer

Access array element after split processor in ingest node

I'm trying to access array element after splitting a string into array using a 'split' processor in an ingest node pipeline? I have a long string separated by slash ('/'). I only want to pass one substring to index, and dump the rest. For example, I…

elasticsearch logstash data-ingestion

asked Jun 23 '17 at 12:49

user8205208

1
1

0

votes

1 answer

Elasticsearch Ingest pipeline -epoch_millis to date format

I am using the reindex API in ES 5.4.1, and I need to convert a long field(which represents a date) to a date field. So the source index looks like "hits": { "total": 1, "max_score": 1, "hits": [ { "_index":…

date elasticsearch data-ingestion

asked Jun 11 '17 at 20:01

user2689782

747
14
31

0

votes

1 answer

Data ingestion with Kafka and Hadoop - how to avoid data duplication that can result from quality check failure?

Here is a simplified scenario: N business flows that need the same raw data from the same source. The data is ingested using Kafka (normal Kafka pipelines) and landed on HDFS where the automatic flow of quality checking is triggered on the raw data…

validation hadoop apache-kafka etl data-ingestion

asked Apr 27 '17 at 07:00

aviad

8,229
9
50
98

0

votes

1 answer

kafka connect job that was working in version 0.9 not working in 0.10.2

When i run my kafka connect job, i get the error below [2017-04-25 14:56:22,806] ERROR Failed to create job for ./etc/kafka-connect-jdbc/sqlserver.properties (org.apache.kafka.connect.cli.ConnectStandalone:88) [2017-04-25 14:56:22,808] ERROR…

sql-server apache-kafka apache-kafka-connect confluent-platform data-ingestion

asked Apr 25 '17 at 22:16

Zigmaphi

15
5

0

votes

1 answer

Hadoop Ingestion automation techniques

My context is ; 10 csv files are uploaded to my server during the night . My process is : Ingestion : Put the files on HDFS Create ORC Hive Table and put data on them . Processing : Spark processing : transformation , cleaning , join…

hadoop apache-nifi data-ingestion

asked Apr 12 '17 at 17:54

Nabil

1,771
4
21
33

0

votes

0 answers

CSV data ingestion in Solr issue

I am new to Solr and trying to ingest CSV file to a demo collection. Below is the command I am trying to execute. [solr@ambari solr]$ curl http://localhost:8983/solr/fbdemo_shard1_replica1/update/csv --data-binary…

csv hadoop solr data-ingestion

asked Mar 12 '17 at 11:47

omer

187
6
16

0

votes

1 answer

How should i evaluate the insert benchmark from CrateDB?

I am trying to understand and interpret the benchmark which is provided from CrateDB. (https://staging.crate.io/benchmark/) I am interested on how many elements can be inserted during one second. I know that this may vary on the size of the tuples.…

database benchmarking evaluate crate data-ingestion

asked Dec 09 '16 at 13:04

duichwer

157
1
14

0

votes

0 answers

Druid / Tranquility (server) / Ingestion / Indexing has not finished

I use Druid 0.9.1.1 & Tranquility 0.8.0, and I followed the quickstart steps here: http://druid.io/docs/0.9.1.1/tutorials/quickstart.html The following command succeed: bin/generate-example-metrics | curl -XPOST -H'Content-Type: application/json'…

indexing real-time druid data-ingestion

asked Oct 11 '16 at 18:05

Cokorda Raka

4,375
6
36
54

0

votes

1 answer

Ingesting particular sources into a particular rack

I have a cluster with three racks. For a set of particular sources I want to have them only being dumped into one rack so that I can monitor the traffic from that particular source to the other destinations. My question is simple. Is it possible to…

hadoop hdfs data-ingestion bigdata

asked Jul 22 '16 at 14:45

Moe

171
2
9

0

votes

2 answers

Spark UDF optimization for Graph Database (Neo4j) inserts

This is first issue i am posting so apologies if i miss some info and mediocre formatting. I can update if required. I will try to add as many details as possible. I have a not so optimized Spark Job which converts RDBMS data to graph nodes and…

scala apache-spark neo4j parallel-processing data-ingestion

asked Jun 23 '16 at 16:47

Nik

431
1
6
10

0

votes

1 answer

Ingest data once in python

I have a dataframe in python which contains all of my data for binary classification. I ingest data in two iterations - once all of the data of one class and then all of the data of the other class. I then run a randomisation of the rows. The…

python pandas data-structures document-classification data-ingestion

asked Jan 16 '16 at 12:59

OAK

2,994
9
36
49

-1

votes

0 answers

How can test the Data Ingestion from D365 CRM to Data Lake?

I'm looking for ideas/existing solutions to effectively test the Data Ingestion from D365 CRM to Data Lake. I wanted to know if this is possible/good idea to do ? I have researched on Fluid Test but that doesn't suit my requirement

dynamics-crm azure-synapse azure-data-lake-gen2 data-ingestion

asked Aug 25 '23 at 04:18

Sugnick Sen

1
2

-1

votes

2 answers

What is the most efficient way to ingest data from Azure to Bigquery?

I need to do a one-time load (batch) from Azure to BigQuery and I am new in the Google Cloud environment. I noticed there are numerous ways to do this, but still isn't clear which option is the most efficient one. Any thoughts on this? Thank…

azure google-bigquery data-ingestion

asked Mar 23 '23 at 17:49

Jo Olive

57
6

Questions tagged [data-ingestion]