0

Suddenly multiple outputs not writing any output to the destination.

I use a custom implementation of multiple outputs, where, I just changed:

  if((ch == '/') || (ch == ':')||(ch == '-')||(ch =='.'))
      {
          continue;
      } 

in the method, as shown below. But the same was working all along, and suddenly it is not working. It is not writing anything to the output directory:

/home/users/mlakshm/

pls help!!!

 private static void checkTokenName(String namedOutput) {
    if (namedOutput == null || namedOutput.length() == 0) {
      throw new IllegalArgumentException(
        "Name cannot be NULL or emtpy");
    }
    for (char ch : namedOutput.toCharArray()) {
      if ((ch >= 'A') && (ch <= 'Z')) {
        continue;
      }
      if ((ch >= 'a') && (ch <= 'z')) {
        continue;
      }
      if ((ch >= '0') && (ch <= '9')) {
        continue;
      }
      if((ch == '/') || (ch == ':')||(ch == '-')||(ch =='.'))
      {
          continue;
      }
      throw new IllegalArgumentException(
        "Name cannot be have a '" + ch + "' char");
    }
  }
Jeffrey
  • 44,417
  • 8
  • 90
  • 141

1 Answers1

0

As you may notice the method checkTokenName() is to make sure that the output name is valid. Now, you are trying to modify the very integral part of MultipleOutputs which you shouldn't. There are a number of reasons that characters like /,:,. and - aren't allowed in the first place:

  1. Many file systems doesn't allow some of these characters in a filename
  2. Using MultipleOutputs, one may write to multiple files but in a designated directory, not at any desired location.
  3. As you may have noticed, for a following named output:

    // Defines additional single text based output 'text' for the job MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class, LongWritable.class, Text.class); The output file names would be text-0000, text0001 etc.

So if you want to write in different directories, then better go for overriding MultipleTextOutputFormat; and if you aren't in liberty of using old API, then you better write to HDFS/S3 yourself rather than relying on hadoop to do that.

Amar
  • 11,930
  • 5
  • 50
  • 73
  • Thanks for the answer.. The guidelines were useful. The multipleoutputs was working fine. I made a mistake in the configure method, which reads the files written using multiple outputs via distributed cache. I used configure(conf, reprter) instead of configure(conf), so the program never entered configure method, and no files were read and i mistook it as no files were written by multiple outputs. – Mahalakshmi Lakshminarayanan Mar 20 '13 at 19:43