Questions tagged [data-ingestion]

248 questions
0
votes
1 answer

MarkLogic Cluster - Add data in 1st host & update in 2nd host throws error

MarkLogic setup is as follows 3 hosts Data confniguration - 1 master forest on each host - 1 replica for each host on different host We have MarkLogic cluster (3 hosts) with failover) deployed on Azure VMs We are using MarkLogic ContentPump…
Manish Joisar
  • 1,256
  • 3
  • 23
  • 47
0
votes
1 answer

MarkLogic Cluster - Configure Forest with all documents

We are working on MarkLogic 9.0.8.2 We are setting up MarkLogic Cluster (3 VMs) on Azure and as per failover design, want to have 3 forests (each for Node) in Azure Blob. I am done with Setup and when started ingestion, i found that documents are…
0
votes
1 answer

How to process multiple different files in different ways using Spring Batch

Background/Context I see almost countless examples of how to process multiple files using Spring Batch, but every single on of them has a single object that all the files are being processed into. So, many files containing compatible data, that are…
Code Jockey
  • 6,611
  • 6
  • 33
  • 45
0
votes
1 answer

How to append to a zipline bundle

I have a trading algorithm that I am backtestesting on zipline. I have successfully ingested US common stocks bundle from a csv file.Moving forward I'd like to backtest it continuously in the end of each trading day. So I'd like to append to my…
Bai hui
  • 21
  • 3
0
votes
3 answers

How to trim fields when loading into dataframe in spark?

We recently received a file to be ingested, the file is PSV format, however, all the fields are padded with extra characters $~$ on the left and right, so the entire PSV is like…
mdivk
  • 3,545
  • 8
  • 53
  • 91
0
votes
1 answer

Druid storing the 0 or 0.0 as null values

versions druid .10.1 from HDP-2.6.5.0 We are using the druid-kafka indexer service ingestion to load the data into druid from kafka topics and during this we have found that druid is storing the metrics values which has 0 or 0.0 are been stored as…
Imran
  • 429
  • 9
  • 23
0
votes
1 answer

How to add a column in marklogic during ingestion?

I have a CSV which I'm loading through mlcp. How to add a column with one string value of my choice during ingestion? What transform functions to use and how? EDIT: I will be using JS to write transformations. The basic workflow is: Write and load…
Mehul
  • 148
  • 9
0
votes
0 answers

Best way ho to validate ingested data

I am ingesting data daily from various external sources like GA, scrapers, Google BQ, etc. I store created CSV file into HDFS, create stage table from it and then append it to historical table in Hadoop. Can you share some best practices how to…
0
votes
1 answer

Hadoop Integration with Document Capture Software

We have requirement to send documents to Hadoop (Hortonworks) from our Image Capture Software: Image Capture Software release PDF document with metadata. I don't have much idea about HDP. Is there any REST service or any tool that can able to add…
0
votes
1 answer

How to configure Apache Flume to delete files that are ignored by the ignorePattern property

I have data coming into a spooldir and I am picking it up using flume and forwarding it further for some processing. There are some files which are not required so I am using the igonorePattern property in flume to avoid being picked up. But the…
0
votes
1 answer

How to use ingest plugin to import text data?

I am able to import structured data into elasticsearch using logstash and derive reports in Kibana.I went through few articles.But, I am not getting a clear understanding on how do I import text(unstructured data) into elastic search using ingest…
Prajna
  • 129
  • 1
  • 8
0
votes
0 answers

SAP HANA Sqoop Import

I am trying to sqoop import from a HANA view. I have tried many ways and it still persists. Anyone had a similar experience and also please help me figure out if I m missing something: Sqoop Job : sqoop import --driver com.sap.db.jdbc.Driver…
Harsha TJ
  • 264
  • 1
  • 8
0
votes
0 answers

tranquility kafka data to Druid Server

I setup a druid cluster,the OVERLORD and COORDINATER ui are all worked.every nodes can also be launched and no errors. I test it by quickstart batch,it worked. Now I would like to use tranquility to ingestion kafka data,when I ran "bin/tranquility…
Frank
  • 977
  • 3
  • 14
  • 35
0
votes
1 answer

How Ingest data to BigQuery from Java application

I want to ingest data to BigQuery from my java application. Is there any performance issue if we are using BigQuery API directly ? Application is running in AWS.
Faisal P P
  • 111
  • 1
  • 9
0
votes
3 answers

Stream data into Google BigQuery using GET Method?

I need a good solution, preferably existing one, such as Google Rest API, for data to stream/insert into BigQuery. I don't want to use POST method to send data - for many design reasons. I am expecting 1000s of writes per seconds. The data will be…