0

Sir I want to do the sentiments analysis of twitter data using Apache hive , flume Now I have a twitter account and i have set the conf file .But the problem is with format of data . it is not loading in hive .Kindly help me, I am working in it for a month.

rahul
  • 31
  • 1
  • 6
  • So that users can provide you with assistance, try to provide as much detail as possible, as well as examples of previous attempts and why they failed. – rwking Sep 02 '15 at 17:11

2 Answers2

0

I think you are able to configure the Flume agent to fetching the data from Twitter. Your problem is the format of the data.

Apache Flume offers several Sink Types. Two of them are useful to your requirement.

  1. HDFS Sink
  2. Hive Sink

Using HDFS Sink:

  1. Configure Flume agent with TwitterSource and HDFS Sink.
  2. Provide your Twitter OAuth details i.e., keys to Flume Agent.
  3. Once Agent configuration is done, then start it.
  4. This agent will fetch the data i.e., tweets from Twitter and stores it in HDFS path as JSON Documents.
  5. Once data is available in HDFS, the create an Hive external table with JSON SerDe with location clause.

JSON SerDe Code link: https://github.com/cloudera/cdh-twitter-example/blob/master/hive-serdes/src/main/java/com/cloudera/hive/serde/JSONSerDe.java

Using Hive Sink:

Flume allows writing the data into Hive Table using Hive Sink. So we need to configure the Flume agent as follows:

TwiiterSource --> Channel --> Hive Sink
  1. Configure Flume agent with TwitterSource and Hive Sink.
  2. Provide your Twitter OAuth details i.e., keys to Flume Agent.
  3. Once Agent configuration is done, then start it.
  4. This agent will fetch the data i.e., tweets from Twitter and stores it in Hive table. This uses JSON SerDe.

Hive Sink has parameter called serializer to tell the type of SerDe.

Supported serializers: DELIMITED and JSON

So please configure your Flume agent using any one of the way above solutions.

Please use this documentation link to get the more details about Sink Parameters (HDFS + Hive)

https://flume.apache.org/FlumeUserGuide.html

Naga
  • 1,203
  • 11
  • 21
  • Thanks sir for your reply, the tweets are storing in hdfs . When i am creating external table and when i am loading the data it shows me an error of "Check the file format". I am using Hive Snapshot 1.6 Serde jar. – rahul Sep 04 '15 at 11:58
0

You can try adding this jar file

hive-serdes-1.0-SNAPSHOT.jar

You can follow the below blog for complete reference of performing sentiment analysis using Hive.

https://acadgild.com/blog/sentiment-analysis-on-tweets-with-apache-hive-using-afinn-dictionary/