I've been trying to create and maintain a Sequence File on HDFS using the Java API without running a MapReduce job as a setup for a future MapReduce job. I want to store all of my input data for the MapReduce job in a single Sequence File, but the data gets appended over time throughout the day. The problem is, if a SequenceFile exists, the following call will just overwrite the SequenceFile instead of appending to it.
// fs and conf are set up for HDFS, not as a LocalFileSystem
seqWriter = SequenceFile.createWriter(fs, conf, new Path(hdfsPath),
keyClass, valueClass, SequenceFile.CompressionType.NONE);
seqWriter.append(new Text(key), new BytesWritable(value));
seqWriter.close();
Another concern is that I cannot maintain a file of my own format and turn the data into a SequenceFile at the end of the day as a MapReduce job could be launched using that data at any point.
I cannot find any other API call to append to a SequenceFile and maintain its format. I also cannot simply concatenate two SequenceFiles because of their formatting needs.
I also wanted to avoid running a MapReduce job for this since it has high overhead for the little amount of data I'm adding to the SequenceFile.
Any thoughts or work-arounds? Thanks.