1

I use MultipleOutputs to output data to some absolute paths, instead of a path relative to OutputPath.

Then, i get the error:

Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/test/convert.bak/326/201505110030/326-m-00035] for [DFSClient_attempt_1425611626220_29142_m_000035_1001_-370311306_1] on client [192.168.7.146], because this file is already being created by [DFSClient_attempt_1425611626220_29142_m_000035_1000_-53988495_1] on [192.168.7.149] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2320) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2083) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2012) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1963) at

cola
  • 691
  • 1
  • 6
  • 15
  • I have set different basePath. It is because of hadoop start multiple attempts(every attempt will try to create that file) for a task(map or reduce) that a file be created many times. if paths i seted is relative paths to OutputPath , it is OK (because files will be created in OutputPath/_temporary,then mv to OutputPath). But if paths is absolute , files will be created in those paths. Why not created in OutputPath/_temporary first. – cola May 13 '15 at 07:18

3 Answers3

1

https://issues.apache.org/jira/browse/MAPREDUCE-6357

Output files must in ${mapred.output.dir} 。

The design and implementation dosn't support outputing data to files out of ${mapred.output.dir}.

cola
  • 691
  • 1
  • 6
  • 15
  • Seriously resurrecting this... Can you elaborate on this answer? I've called `FileOutputFormat.setOutputPath(hadoopJob, new Path(args[1]));` and then added named outputs. The names of the outputs are just things like `call` or `segment`. They show up in the output path that I specify for `FileOutputFormat`. I'm suspicious your answer applies to my situation (exact same error as OP on retry), but I'm not sure I'm following your meaning. – John Chrysostom Nov 21 '16 at 20:04
  • 1
    @JohnChrysostom This error occurs when the task retries. – cola Dec 01 '16 at 07:33
  • Yes, I'm experiencing that error on retries. I've now fixed the underlying problem which was causing the retries in the first place, but I'm interested in knowing if there's away to have the retries use a new file name rather than trying to write over an existing file? – John Chrysostom Dec 01 '16 at 16:27
  • @JohnChrysostom OutputCommiter will give attempt task his own path.It works well when the file in outputpathor or subdirectories .But if the file not in outputpathor or subdirectories,it will thorw exceprion.You may need to understand how MultipleOutputs set attempt task file path。 – cola Dec 13 '16 at 10:28
0

By looking into stack trace error, it seems that output file is already created.

If you want to write your data into multiple files, then try to generate those file name dynamically and use those files name as shown in code taken from Hadoop Definitive Guide

String basePath = String.format("%s/%s/part", parser.getStationId(), parser.getYear());
multipleOutputs.write(NullWritable.get(), value, basePath);

I hope this will help.

Farooque
  • 3,616
  • 2
  • 29
  • 41
  • I do it . But the basePath is absolute path . if basePath is relative path ,it is OK . – cola May 13 '15 at 07:25
-1

As it clearly suggests that the path you are trying to create,already exists. So try to do a check before creating that path whether that path exists or not.If exists, then delete that path.

    FileSystem hdfs;
    Path path = new Path (YourHadoopPath);
    if (hdfs.exists(path)) {
        hdfs.delete(path);
    }
salmanbw
  • 1,301
  • 2
  • 17
  • 23