Questions tagged [data-ingestion]
248 questions
0
votes
1 answer
Updating hive table with sqoop from mysql table
I have already a hive table called roles. I need to update this table with info coming up from mysql. So, I have used this script think that it will add and update new data on my hive table:`
sqoop import --connect…

Andres Urrego Angel
- 1,842
- 7
- 29
- 55
0
votes
1 answer
sqoop export update table record in RDBMS MySQL
So I'm trying to perform an update in an RDBMS table in MySQL. The thing is that this update is coming from a file in my HDFS and although in MySQL the table count with a primary key when I update the records the result sets came up with duplicated…

Andres Urrego Angel
- 1,842
- 7
- 29
- 55
0
votes
1 answer
Access array element after split processor in ingest node
I'm trying to access array element after splitting a string into array using a 'split' processor in an ingest node pipeline?
I have a long string separated by slash ('/'). I only want to pass one substring to index, and dump the rest.
For example, I…

user8205208
- 1
- 1
0
votes
1 answer
Elasticsearch Ingest pipeline -epoch_millis to date format
I am using the reindex API in ES 5.4.1, and I need to convert a long field(which represents a date) to a date field. So the source index looks like
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index":…

user2689782
- 747
- 14
- 31
0
votes
1 answer
Data ingestion with Kafka and Hadoop - how to avoid data duplication that can result from quality check failure?
Here is a simplified scenario:
N business flows that need the same raw data from the same source.
The data is ingested using Kafka (normal Kafka pipelines) and landed on HDFS where the automatic flow of quality checking is triggered on the raw data…

aviad
- 8,229
- 9
- 50
- 98
0
votes
1 answer
kafka connect job that was working in version 0.9 not working in 0.10.2
When i run my kafka connect job, i get the error below
[2017-04-25 14:56:22,806] ERROR Failed to create job for ./etc/kafka-connect-jdbc/sqlserver.properties (org.apache.kafka.connect.cli.ConnectStandalone:88)
[2017-04-25 14:56:22,808] ERROR…

Zigmaphi
- 15
- 5
0
votes
1 answer
Hadoop Ingestion automation techniques
My context is ;
10 csv files are uploaded to my server during the night .
My process is :
Ingestion :
Put the files on HDFS
Create ORC Hive Table and put data on them .
Processing :
Spark processing : transformation , cleaning , join…

Nabil
- 1,771
- 4
- 21
- 33
0
votes
0 answers
CSV data ingestion in Solr issue
I am new to Solr and trying to ingest CSV file to a demo collection. Below is the command I am trying to execute.
[solr@ambari solr]$ curl http://localhost:8983/solr/fbdemo_shard1_replica1/update/csv
--data-binary…

omer
- 187
- 6
- 16
0
votes
1 answer
How should i evaluate the insert benchmark from CrateDB?
I am trying to understand and interpret the benchmark which is provided from CrateDB. (https://staging.crate.io/benchmark/)
I am interested on how many elements can be inserted during one second.
I know that this may vary on the size of the tuples.…

duichwer
- 157
- 1
- 14
0
votes
0 answers
Druid / Tranquility (server) / Ingestion / Indexing has not finished
I use Druid 0.9.1.1 & Tranquility 0.8.0, and I followed the quickstart steps here: http://druid.io/docs/0.9.1.1/tutorials/quickstart.html
The following command succeed:
bin/generate-example-metrics | curl -XPOST -H'Content-Type: application/json'…

Cokorda Raka
- 4,375
- 6
- 36
- 54
0
votes
1 answer
Ingesting particular sources into a particular rack
I have a cluster with three racks. For a set of particular sources I want to have them only being dumped into one rack so that I can monitor the traffic from that particular source to the other destinations. My question is simple. Is it possible to…

Moe
- 171
- 2
- 9
0
votes
2 answers
Spark UDF optimization for Graph Database (Neo4j) inserts
This is first issue i am posting so apologies if i miss some info and mediocre formatting. I can update if required.
I will try to add as many details as possible. I have a not so optimized Spark Job which converts RDBMS data to graph nodes and…

Nik
- 431
- 1
- 6
- 10
0
votes
1 answer
Ingest data once in python
I have a dataframe in python which contains all of my data for binary classification. I ingest data in two iterations - once all of the data of one class and then all of the data of the other class. I then run a randomisation of the rows.
The…

OAK
- 2,994
- 9
- 36
- 49
-1
votes
0 answers
How can test the Data Ingestion from D365 CRM to Data Lake?
I'm looking for ideas/existing solutions to effectively test the Data Ingestion from D365 CRM to Data Lake.
I wanted to know if this is possible/good idea to do ?
I have researched on Fluid Test but that doesn't suit my requirement

Sugnick Sen
- 1
- 2
-1
votes
2 answers
What is the most efficient way to ingest data from Azure to Bigquery?
I need to do a one-time load (batch) from Azure to BigQuery and I am new in the Google Cloud environment. I noticed there are numerous ways to do this, but still isn't clear which option is the most efficient one.
Any thoughts on this? Thank…

Jo Olive
- 57
- 6