Questions tagged [flume]

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

1136 questions
6
votes
3 answers

How to insert JSON in HDFS using Flume correctly

I am using the HTTPSource in Flume for receiving POST events in json format as follows: {"username":"xyz","password":"123"} My question is: Do I have to modify the source of the events (I mean the one that is sending the JSON to the Flume) so the…
nanounanue
  • 7,942
  • 7
  • 41
  • 73
6
votes
1 answer

Zookeeper keeps getting the WARN: "caught end of stream exception"

I am now using a CDH-5.3.1 cluster with three zookeeper instances located in three ips: 133.0.127.40 n1 133.0.127.42 n2 133.0.127.44 n3 Everything works fine when it starts, but these days I notice that the node n2 keeps getting the WARN: caught…
Baskwind
  • 61
  • 1
  • 1
  • 2
6
votes
2 answers

Using Kafka to import data to Hadoop

Firstly I was thinking what to use to get events into Hadoop, where they will be stored and periodically analysis would be performed on them (possibly using Ooozie to schedule periodic analysis) Kafka or Flume, and decided that Kafka is probably a…
Kobe-Wan Kenobi
  • 3,694
  • 2
  • 40
  • 67
6
votes
2 answers

Apache flume twitter agent not streaming data

I am trying to stream twitter feeds to hdfs and then use hive. But the first part, streaming data and loading to hdfs is not working and giving Null Pointer Exception. This is what I have tried. 1. Downloaded apache-flume-1.4.0-bin.tar. Extracted…
user2094311
6
votes
1 answer

Flume: Directory to Avro -> Avro to HDFS - Not valid avro after transfer

I have users writing AVRO files and I want to use Flume to move all those files into HDFS using Flume. So I can later use Hive or Pig to query/analyse the data. On the client I installed flume and have a SpoolDir source and AVRO sink like…
danielfrg
  • 2,597
  • 2
  • 22
  • 23
6
votes
1 answer

How can I get logs collected on console using Flume NG?

I'm testing Flume NG (1.2.0) for collecting logs. It's a simple test that Flume collects a log file flume_test.log and prints collected logs to console as sysout. conf/flume.conf is: agent.sources = tail agent.channels = memoryChannel agent.sinks =…
philipjkim
  • 3,999
  • 7
  • 35
  • 48
5
votes
1 answer

Flume automatic scalability and failover

My company is considering using flume for some fairly high volume log processing. We believe that the log processing needs to be distributed, both for volume (scalability) and failover (reliability) reasons, and Flume seems the obvious…
Marc Harris
  • 133
  • 1
  • 9
5
votes
1 answer

Flume agent - can I specify compression like gzip or bz2?

Is it possible to specify a compression option on a Flume agent so that the data is transferred to the collector in a compressed format? I know there are compression options on the collector level, but it would be also extremely useful to be able…
Aleksei T
  • 51
  • 1
5
votes
1 answer

Ingest log files from edge nodes to Hadoop

I am looking for a way to stream entire log files from edge nodes to Hadoop. To sum up the use case: We have applications that produce log files ranging from a few MB to hundreds of MB per file. We do not want to stream all the log events as they…
j9dy
  • 2,029
  • 3
  • 25
  • 39
5
votes
2 answers

Usable space exhausted in flume using file channel

I’m working on Flume with Spool Directory as the Source,HDFS as sink and File as channel. When executing the flume job. I’m getting below issue. Memory channel is working fine. But we need to implement the same using File channel. Using file…
5
votes
2 answers

log4j2- ERROR Appenders contains an invalid element or attribute "Flume"

I am trying to use Flume Appender Properties of log4j2 .But the following errors are obtained when run the program . 2016-01-20 16:36:42,436 main ERROR Appenders contains an invalid element or attribute "Flume" 2016-01-20 16:36:42,436 main…
Rabindra Nath Nandi
  • 1,433
  • 1
  • 15
  • 28
5
votes
1 answer

Write CSV files to HDFS using Flume

I'm writing a number of CSV files from my local file system to HDFS using Flume. I want to know what would be the best configuration for Flume HDFS sink such that each file on local system will be copied exactly in HDFS as CSV. I want each CSV file…
oikonomiyaki
  • 7,691
  • 15
  • 62
  • 101
5
votes
1 answer

Using local file system as Flume source

I've just started learning Big Data, and at this time, I'm working on Flume. The common example I've encountered is for processing of tweets (the example from Cloudera) using some Java. Just for testing and simulation purposes, can I use my local…
oikonomiyaki
  • 7,691
  • 15
  • 62
  • 101
5
votes
2 answers

Impala - file not found error

I'm using impala with flume as filestream. The problem is flume is adding temporary files with extension .tmp, and then when they are deleted impala queries are failing with the following message: Backend 0:Failed to open HDFS file …
griffon vulture
  • 6,594
  • 6
  • 36
  • 57
5
votes
4 answers

flume - flume.root.logger=DEBUG,console only logs INFO level log statements

I installed Flume 1.4.0-cdh4.7.0 in CentOS (cloudera VM) I ran the following command to start the flume flume-ng agent -n agent-name -c conf -f conf/flume.conf -Dflume.root.looger=DEBUG,console but it is only writing the default (INFO) level to the…
scott
  • 235
  • 4
  • 12
1 2
3
75 76