0

I am a newbie to Hadoop and Mahout. I wanted to know how to convert a simple text file containing a set of vectors to sequence file. I have tried the MR framework and changed outputFormat to SequenceFileOutputFormat, and I get following output

SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text��.�U_v�;�Vs�'�sample0 1 2 3 4 5sample1 6 7 8 9 10sample211 12 13 14 15sample316 17 18 19 20

Those hazy characters are in binaries so can't be read but my issue is how to get sample0 1 2 3 4 , similarly others to SequenceFile format (binary format).

I believe it can be done by changing the output of mapper function, however I am unable to figure it out.

-Thanks for your time.

Jayant
  • 346
  • 3
  • 14
  • Sequence File is not human readable.Its encrypted. If you want to read the contents of it, refer http://stackoverflow.com/questions/8265256/how-to-read-hadoop-sequential-file – Pavan Jan 29 '14 at 08:09
  • Correct me if I am wrong in my case, I think my sequence file is incorrect as sample0 1 2 3 4 should also be encrypted which is the actual data of the file. What you are saying comes after successful creation of sequence file. – Jayant Jan 29 '14 at 08:31

0 Answers0