I'm trying to get tweets using Flume. I am working with cloudera
I use the twitter source provided here
Below is my configuration file:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
#
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <>
TwitterAgent.sources.Twitter.consumerSecret = <>
TwitterAgent.sources.Twitter.accessToken = <>
TwitterAgent.sources.Twitter.accessTokenSecret = <>
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/root/flume/tweet/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.hdfs.callTimeout = 180000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
This is the command:
sudo /usr/bin/flume-ng agent -c /usr/lib/flume-ng/conf -f /usr/lib/flume-ng/conf/flume-conf.properties -Dflume.root.logger=INFO,console -n TwitterAgent
It seems to work fine, its processed and I found several files in the file system like: FlumeData.1523723075629
However, these files' formats are unknown and I think their should be in JSON format. I tried to open one of them via note pad ++ and I found tweets but the structure is not clear. Also, tweets are not based on the specified keywords in the config file. How to solve these issues? How to get the correct files format and how to get the correct tweets?
Thanks in advance