Questions tagged [flume-ng]

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. The Flume-NG is refactoring of the first generation Flume to solve certain known issues and limitations of the original design.

This tag should be used with questions about Flume-NG API and specific features of new-generation versions (e.g. Flume HDFS Sink was introduced only in NG version and cannot be used in previous releases).

397 questions
3
votes
1 answer

Getting variables in flume.conf

I have a flume agent declared in a flume.con file. Source is RabbitMQ, although this is not so relevant. The thing is that I need to take out the credentials from there to another file. I saw that the way to do that is in flume-env.sh, where I put…
josele
  • 79
  • 1
  • 9
3
votes
1 answer

Duplicate channel before being intercepted by interceptor

I'm using flume to do something like this Source --> interceptor --> Channel --> multiplexing --> HDFS Sink |-----------> Null Sink I would like to add a channel just after the source but I don't want event…
guillaume
  • 1,638
  • 5
  • 24
  • 43
3
votes
1 answer

Unable to correctly load twitter avro data into hive table

Need your help! I am trying a trivial exercise of getting the data from twitter and then loading it up in Hive for analysis. Though I am able to get data into HDFS using flume (using Twitter 1% firehose Source) and also able to load the data into…
Rakesh Gupta
  • 31
  • 1
  • 3
3
votes
1 answer

Flume - Solr Integration

This is my scenario. Input JSON data flows to Flume and it needs to be indexed and stored into Solr in near real time. I am using the latest CDH release. I did not find the documentation complete. It is disconnected at places. Can you please point…
Manoj S
  • 66
  • 4
3
votes
1 answer

Flume - Can an entire file be considered an event in Flume?

I have a use case where I need to ingest files from a directory into HDFS. As a POC, I used simple Directory Spooling in Flume where I specified the source, sink and channel and it works fine. The disadvantage is that I would have to maintain…
CodingInCircles
  • 2,565
  • 11
  • 59
  • 84
3
votes
2 answers

Flume not writing logs to Hdfs

So i configured flume to write my apache2 access logs to hdfs ...and as i figured out the by the logs of flume is that all the configuration are correct but i dont know the reason why is it still not writing to hdfs. So here is my flume config…
gstackk
  • 43
  • 5
3
votes
1 answer

Reading Flume spoolDir in parallel

Since I'm not allowed to set up Flume on prod servers, I have to download the logs, put them in a Flume spoolDir and have a sink to consume from the channel and write to Cassandra. Everything is working fine. However, as I have a lot of log files…
2
votes
0 answers

Memory usage of Flume MemoryChannel

I'm troubleshooting some memory issues I'm encountering while sending messages to Flume from some Java code. The code runs two EmbeddedAgents, each with a memory channel and some sinks pointing to a remote server. I read in Flume documentation about…
Gaël J
  • 11,274
  • 4
  • 17
  • 32
2
votes
1 answer

How can I calculate the appropriate amount of channel capacity?

I am looking for a solution because the sth-channel is full. I am troubled with calculating the appropriate capacity of channel capacity. This document has the following description. In order to calculate the appropriate capacity, just have in…
hiro
  • 59
  • 3
2
votes
1 answer

Using Flume to Ingest data from kafka to HDFS:: ConfigurationException: Bootstrap Servers must be specified

I am trying to ingest data using flume from kafka source to hdfs. Below is my flume conf file. flume1.sources = kafka-source-1 flume1.channels = hdfs-channel-1 flume1.sinks = hdfs-sink-1 flume1.sources.kafka-source-1.type =…
Jahar tyagi
  • 91
  • 13
2
votes
0 answers

Exception while streaming tweets Received fatal alert: access_denied in Flume

I currently have this configuration in Flume TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS TwitterAgent.sources.Twitter.type=…
Mohit.kc
  • 73
  • 1
  • 1
  • 7
2
votes
0 answers

What Hadoop jar dependencies do I need to setup a HDFS sink in Flume?

I'm using a Docker image of Flume from probablyfine/flume. I'm trying to configure a HDFS sink and I'm getting this error about dependencies. Google search results show I need to include Hadoop libs, but many of the results are old from when Hadoop…
user99999991
  • 1,351
  • 3
  • 19
  • 43
2
votes
0 answers

HDFS ingestion rate frequently drops drastically from all Flume agents. How to investigate/rectify?

I have a good sized Hadoop cluster, with multiple Flume agents (1 agent per machine, not part of the cluster) writing to using HDFSSink. Almost 95% of the time, the Sink batch completion rate is in line with source event rate, thus showing minimal…
Viren
  • 170
  • 1
  • 10
2
votes
1 answer

Flume creating small files

I am trying to move my files in hdfs from local system using flume but when i am running my flume it is creating many small files. Size of my original file's are 154 - 500Kb but in my HDFS it is creating many files of size 4-5kb. I searched and got…
Ironman
  • 1,330
  • 2
  • 19
  • 40
2
votes
2 answers

How to extract all the collected tweets in a single file

I'm using Flume to collect tweets and store them on HDFS. The collecting part is working fine, and I can find all my tweets in my file system. Now I would like to extract all these tweets in one single file. The problem is that the different tweets…
Omegaspard
  • 1,828
  • 2
  • 24
  • 52
1 2
3
26 27