1

I am writing to hadoop file system. But everytime I append something, it overwrites the data instead of adding it to the existing data/file. The code which is doing this is provided below. This code is called again and again for different data. Is opening a new SequenceFile.Writer everytime a problem?

Each time I am getting the path as new Path("someDir");

  public void writeToHDFS(Path path, long uniqueId, String data){
      FileSystem fs = path.getFileSystem(conf);
      SequenceFile.Writer inputWriter = new SequenceFile.Writer(fs, conf,
          path, LongWritable.class, MyWritable.class);
      inputWriter.append(new LongWritable(uniqueId++), new MyWritable(data));
      inputWriter.close();
  }
TheHat
  • 434
  • 3
  • 6
  • 12
  • I don't see the use of a sequencefile when you just put a record into it and directly close it. Keep the file open and constantly append. – Thomas Jungblut Nov 01 '11 at 10:04

1 Answers1

3

There is currently no way to append to an existing SequenceFile through the API. When you make the new SequenceFile.Writer object, it will not append to an existing file at that Path, but instead overwrite it. See my earlier question.

As Thomas points out, if you keep the same SequenceFile.Writer object, you will be able to append to the file until you call close().

Community
  • 1
  • 1
Matt D
  • 3,055
  • 1
  • 18
  • 17
  • What if I have too many paths to write to randomly? Can I keep lots of SequenceFile.Writer open? – TheHat Nov 01 '11 at 15:14
  • Since `SequenceFile.Writer` does not have a flush method, all of its contents will be in memory until you close them. So, keeping a lot of Writers open will not scale. It may make sense to create a MapReduce job using the `SequenceFileOutputFormat` to construct your SequenceFiles if the problem lends itself to MapReduce. – Matt D Nov 01 '11 at 15:43
  • Can FSDataOutputStream be used for writing key value? Will writing key.getBytes{space}value.getBytes{newline} be similar to SequenceFile.Writer's append? – TheHat Nov 01 '11 at 17:06
  • I'm not sure, but it seems possible. You can look at the source for SequenceFile.java to see how it uses `FSDataOutputStream`. – Matt D Nov 01 '11 at 17:31
  • I did it by not closing the writer until the writing was done. Thanks. – TheHat Nov 22 '11 at 06:14