Questions tagged [flume]

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

1136 questions
8
votes
1 answer

Simple Flume agent hat some lag when logging to the console

I have a simple Flume agent with the following configuration: agent.sources = http-source agent.sinks = logger-sink agent.channels = logger-channel # HTTP Source ############################### agent.sources.http-source.type = …
Pasmod Turing
  • 153
  • 1
  • 6
8
votes
2 answers

Save flume output to hive table with Hive Sink

I am trying to configure flume with Hive to save flume output to hive table with Hive Sink type. I have single node cluster. I use mapr hadoop distribution. Here is my flume.conf agent1.sources = source1 agent1.channels = channel1 agent1.sinks =…
Andrey Braslavskiy
  • 211
  • 1
  • 3
  • 10
7
votes
5 answers

Flume sink to HDFS error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument

With: Java 1.8.0_231 Hadoop 3.2.1 Flume 1.8.0 Have created a hdfs service on 9000 port. jps: 11688 DataNode 10120 Jps 11465 NameNode 11964 SecondaryNameNode 12621 NodeManager 12239 ResourceManager Flume…
pingze
  • 973
  • 2
  • 9
  • 18
7
votes
2 answers

Distributed Logging with flume

I have a mobile service distributed over 7 servers each of them doing a specific task. I want to log information from them and later derive business intelligence from them. I have rounded it to Flume. How can I use it to gather information? My…
Ron Davis
  • 346
  • 8
  • 28
7
votes
1 answer

Avro Text file generated by Flume Twitter Agent not being read in Java

Not able to read and parse the File created by streaming twitter data using Flume twitter agent, neither using Java nor Avro Tools. My requirement is to convert the avro format into JSON format. When using either of the method, I get the exception :…
Ashu
  • 367
  • 5
  • 14
7
votes
3 answers

Configure sink elasticsearch apache-flume

This is my first time here, so sorry if I don't post fine, and sorry for my bad English. I'm trying to configure Apache Flume and Elasticsearch sinks. Everything is ok, it seems that it runs fine, but there are 2 warnings when I start an agent; the…
Lifestorm
  • 91
  • 1
  • 6
7
votes
2 answers

Use flume to stream data to S3

I am trying flume for something very simple, where I would like to push content from my log files to S3. I was able to create a flume agent that would read the content from an apache access log file and use a logger sink. Now I am trying to find a…
user3277217
  • 247
  • 1
  • 3
  • 7
7
votes
1 answer

Generating an avro schema with optional values

I am trying to write a very easy avro schema (easy because I am just pointing out my current issue) to write an avro data file based on data stored in json format. The trick is that one field is optional, and one of avrotools or me is not doing it…
Guillaume
  • 2,325
  • 2
  • 22
  • 40
7
votes
2 answers

Flume not writing to HDFS unless killed

I followed the link for setting TwitterSource and HDFS sink. Command used for starting the agent: bin/flume-ng agent -n TwitterAgent -c conf -f conf/flume-conf.properties -Dflume.root.logger=DEBUG,console I was successful in doing that, but there…
vishnu viswanath
  • 3,794
  • 2
  • 36
  • 47
7
votes
1 answer

How can I force Flume-NG to process the backlog of events after a sink failed?

I'm trying to setup Flume-NG to collect various kinds of logs from a bunch of servers (mostly running Tomcat instances and Apache Httpd) and dump them into HDFS on a 5-node Hadoop cluster. The setup looks like this: Each application server tails…
DandyDev
  • 319
  • 3
  • 13
6
votes
0 answers

Using Flume to ingest real-time log data from remote server (which does not have Flume) on same network

I have server X that has Hadoop and Flume installed, and I have server Y that has neither but is on the same network. Server Y currently stores data into a log file that is continuously written two until a date stamp is appended at the end of the…
ewong18
  • 144
  • 1
  • 2
  • 10
6
votes
1 answer

What is difference between Apache flume and Apache storm?

What is difference between Apache flume and Apache storm? Is is possible to ingest logs data into Hadoop cluster using storm? Both are used for streaming data so can storm be used as an alternative to flume?
Hassam
  • 103
  • 1
  • 11
6
votes
3 answers

Which is the easiest way to combine small HDFS blocks?

I'm collecting logs with Flume to the HDFS. For the test case I have small files (~300kB) because the log collecting process was scaled for the real usage. Is there any easy way to combine these small files into larger ones which are closer to the…
KARASZI István
  • 30,900
  • 8
  • 101
  • 128
6
votes
1 answer

Flume HDFS sink: Remove timestamp from filename

I have configured flume agent for my application, where source is Spooldir and sink is HDFS I am able to collect files in hdfs. agent configuration is: agent.sources = src-1 agent.channels = c1 agent.sinks = k1 agent.sources.src-1.type =…
6
votes
2 answers

Best practice for integrating Kafka and HBase

what are best practices for "importing" streamed data from Kafka into HBase? The usecase is as follows: Vehicle sensor data are streamed to Kafka. Afterwards, these sensordata must be transformed (i.e., deserialized from protobuf in humanreadable…
Thomas Beer
  • 230
  • 3
  • 9
1
2
3
75 76