0

I'm trying to get tweets using Flume. I am working with cloudera

I use the twitter source provided here

Below is my configuration file:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
#
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <>
TwitterAgent.sources.Twitter.consumerSecret = <>
TwitterAgent.sources.Twitter.accessToken = <>
TwitterAgent.sources.Twitter.accessTokenSecret = <>
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/root/flume/tweet/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.hdfs.callTimeout = 180000


TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

This is the command:

sudo /usr/bin/flume-ng agent -c /usr/lib/flume-ng/conf -f /usr/lib/flume-ng/conf/flume-conf.properties -Dflume.root.logger=INFO,console -n TwitterAgent

It seems to work fine, its processed and I found several files in the file system like: FlumeData.1523723075629

However, these files' formats are unknown and I think their should be in JSON format. I tried to open one of them via note pad ++ and I found tweets but the structure is not clear. Also, tweets are not based on the specified keywords in the config file. How to solve these issues? How to get the correct files format and how to get the correct tweets?

Thanks in advance

user971961
  • 65
  • 1
  • 4

1 Answers1

0

Thanks, it's resolved I changed TwitterAgent.sources.Twitter.type=org.apache.flume.source.twitter.TwitterSource to TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource

user971961
  • 65
  • 1
  • 4