2

I want to read a specific SequenceFile from HDFS from a client application. I can do this by using the SequenceFile.Reader and it works fine. But is it also possible to check whether a file is a SequenceFile other from analyzing the thrown IOExceptions?

rabejens
  • 7,594
  • 11
  • 56
  • 104

1 Answers1

2

I dug around the Hadoop documentation, source code and web and found a solution: SequenceFiles start with a four byte header reading SEQn where n is the version of the file (a positive, one-byte number, but never greater than 6). So for the check, one can do the following:

  1. Open the file as a normal FSDataInputStream with FileSystem.open
  2. Read the first three bytes as an ASCII string
  3. Check if they say SEQ - if not, no SequenceFile
  4. Check if the next byte is less than or equal to 6, and greater than 0, if yes -> SequenceFile

This should be a utility method in SequenceFile, e.g. SequenceFile.isSequenceFile

EDIT: I posted a JIRA about this: HDFS-7378

rabejens
  • 7,594
  • 11
  • 56
  • 104