8

I have a sequential file which is the output of hadoop map-reduce job. In this file data is written in key value pairs ,and value itself is a map. I want to read the value as a MAP object so that i can process it further.

    Configuration config = new Configuration();
    Path path = new Path("D:\\OSP\\sample_data\\data\\part-00000");
    SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config);
    WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
    Writable value = (Writable) reader.getValueClass().newInstance();
    long position = reader.getPosition();

    while(reader.next(key,value))
    {
           System.out.println("Key is: "+textKey +" value is: "+val+"\n");
    }

output of program: Key is: [this is key] value is: {abc=839177, xyz=548498, lmn=2, pqr=1}

Here i am getting value as string ,but i want it as a object of map.

samarth
  • 3,866
  • 7
  • 45
  • 60
  • Where comes `val` from? And a Map is not `Writable`, what are you using for classes in your m/r job? – Thomas Jungblut Nov 25 '11 at 06:44
  • I just have the sequential file and not aware of what they are doing in map reduce job.And i am provided with following information."Each such file needs to be opened as a sequence file. A decompression codec needs to be used - the sequence file class seems to be able to tell you what compression codec to use, and then I think each key and each value is encoded using TypedBytes." – samarth Nov 25 '11 at 08:58
  • Then you have to get the classes of the key and values, otherwise you won't deserialize them properly. – Thomas Jungblut Nov 25 '11 at 09:17
  • The value class is "TypedBytesWritable" how could i get the Map object out of this? – samarth Nov 25 '11 at 09:48
  • @samarth how to read compressed (gz / bz2 / snappy) sequence file. – ParagFlume Feb 03 '16 at 04:44

1 Answers1

6

Check the API documentation for SequenceFile#next(Writable, Writable)

while(reader.next(key,value))
{
       System.out.println("Key is: "+textKey +" value is: "+val+"\n");
}

should be replaced with

while(reader.next(key,value))
{
       System.out.println("Key is: "+key +" value is: "+value+"\n");
}

Use SequenceFile.Reader#getValueClassName to get the value type in the SequenceFile. SequenceFile have the key/value types in the file header.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • Thanks man,the value class is "TypedBytesWritable" can i get the map object from this class? – samarth Nov 25 '11 at 09:47
  • 1
    [TypedBytesWritable#getValue](http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/typedbytes/TypedBytesWritable.html#getValue%28%29) should get the Object. – Praveen Sripati Nov 25 '11 at 10:35