0

I am trying to analyze twitter data using flume i got the files from twitter using flume in BigInsights but the data I received is of compressed Avro schema which is not readable can anyone tell me a way so that can convert that file to JSON (Readable) in order to do some analysis on it.

Or is there any way so that the data I receive is already in JSON (Readable) format.
Thanks In Advance.

This is the data i received

enter image description here

Community
  • 1
  • 1
  • Hi and welcome to stack overflow! What have you tried so far to read / parse the JSON? Please consider adding an example of your code, highlighting the parts where it goes wrong. – Philipp Mar 31 '17 at 07:29
  • This data is in readable format only. It has different languages other than English as well as in this kind of data you will usually get junk characters which you need to either handle or replace it before using the data for processing. – Rajen Raiyarela Mar 31 '17 at 07:48

1 Answers1

0

Avro format is not designed to be human readable and it's desinged to be consumed by programs. But you have a few options to view this data or even better analyze the data.

Create Hive Table: This option will allow you to analyze data using SQL queries, Spark SQL, Spark notebooks, visualization tools like Tableau and Excel too. Your table creation script will look like this:

CREATE TABLE twitter_data
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{...

In schema literal, you can define your own schema too.

Write Program: If you are developer and want to/like to wrangle data using programming, you have many languages to choose from to read, parse, convert and write from Avro file to JSON.

alpeshpandya
  • 492
  • 3
  • 12