I want to read a specific SequenceFile
from HDFS from a client application. I can do this by using the SequenceFile.Reader
and it works fine. But is it also possible to check whether a file is a SequenceFile
other from analyzing the thrown IOException
s?
Asked
Active
Viewed 542 times
2

rabejens
- 7,594
- 11
- 56
- 104
1 Answers
2
I dug around the Hadoop documentation, source code and web and found a solution: SequenceFile
s start with a four byte header reading SEQn
where n
is the version of the file (a positive, one-byte number, but never greater than 6). So for the check, one can do the following:
- Open the file as a normal
FSDataInputStream
withFileSystem.open
- Read the first three bytes as an ASCII string
- Check if they say
SEQ
- if not, noSequenceFile
- Check if the next byte is less than or equal to 6, and greater than 0, if yes ->
SequenceFile
This should be a utility method in SequenceFile, e.g. SequenceFile.isSequenceFile
EDIT: I posted a JIRA about this: HDFS-7378

rabejens
- 7,594
- 11
- 56
- 104
-
It would be great if you could also post the link to the document where you found it. – Harinder May 12 '15 at 06:09
-
I found this out by single-stepping the opening of a `SequenceFile` in the debugger. – rabejens May 12 '15 at 07:31