Flume not accepting keywords for Twitter stream

Question

a Hadoop neophyte here, using this tutorial: https://acadgild.com/blog/streaming-twitter-data-using-flume/ to capture tweets. Here is my flume.conf file:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey=xxxx
TwitterAgent.sources.Twitter.consumerSecret=xxxx
TwitterAgent.sources.Twitter.accessToken=xxxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxxx

TwitterAgent.sources.Twitter.keywords= #canpoli

TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs:/xxxx/user/flume/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600

TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=1000

TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel`

It streams tweets fine, it saves into my desired directory correctly, but it seems to be streaming everything without filtering for my keyword. I get tweets from all over the world except with that hashtag.

What might be the issue?

OneCricketeer · Answer 1 · 2017-09-28T12:35:26.500

1

First, it's accepting all hashtags because you gave an empty list.

# character is a comment, so everything after the equals is seen as ignored. At least I think that's how it is parsed.

You linked to a site that doesn't use the #, so I would follow that tutorial until it worked.

Secondly, that source is considered experimental, and doesn't seem to mention keywords.

https://flume.apache.org/FlumeUserGuide.html#twitter-1-firehose-source-experimental

Your code looks almost exactly like this example from Cloudera that includes comments in the config and has keywords, which if you check, there's a different source class.

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource

https://github.com/cloudera/cdh-twitter-example/blob/master/flume-sources/flume.conf

You need to download that java code there, package it to a JAR, and place it in the Flume lib directory

edited Sep 28 '17 at 12:35

answered Sep 26 '17 at 23:53

OneCricketeer

179,855
19
132
245

I remove the hashtag and I get the same result. All the tweets that are being ingested do not contain any of the keywords. – JLA Sep 27 '17 at 04:13
What if you run the example provided? – OneCricketeer Sep 27 '17 at 04:29
The exact same thing happens. – JLA Sep 27 '17 at 16:30
Are you sure you are running the correct config? What command are you using? – OneCricketeer Sep 27 '17 at 16:32
flume-ng agent -n TwitterAgent -f /etc/flume/conf/flume.conf (where the file is located) And I did double check, I am running the same the same config that I am editing. – JLA Sep 27 '17 at 17:42
Hi, I've been playing with this for a few days. No luck and no change. Using 'com.cloudera.flume.source.TwitterSource' will not stream any tweets. I have moved all the jars to the Flume lib directory and went back to the original TwitterSource. It still streams away without filtering for keywords. – JLA Oct 03 '17 at 19:19
Hmm. I'd have to re-download the Cloudera VM to test it out myself. – OneCricketeer Oct 03 '17 at 19:49
I'm not using Cloudera - I'm using Google Cloud VMs with Hortonworks installed. – JLA Oct 03 '17 at 19:51
Same idea. I don't have a working Flume installation at the moment. – OneCricketeer Oct 03 '17 at 19:57

Flume not accepting keywords for Twitter stream

1 Answers1