What are SUCCESS and part-r-00000 files in hadoop

Question

Although I use Hadoop frequently on my Ubuntu machine I have never thought about SUCCESS and part-r-00000 files. The output always resides in part-r-00000 file, but what is the use of SUCCESS file? Why does the output file have the name part-r-0000? Is there any significance/any nomenclature or is this just a randomly defined?

Chris White · Accepted Answer · 2013-04-29T02:50:34.790

74

See http://www.cloudera.com/blog/2010/08/what%E2%80%99s-new-in-apache-hadoop-0-21/

On the successful completion of a job, the MapReduce runtime creates a _SUCCESS file in the output directory. This may be useful for applications that need to see if a result set is complete just by inspecting HDFS. (MAPREDUCE-947)

This would typically be used by job scheduling systems (such as OOZIE), to denote that follow-on processing on the contents of this directory can commence as all the data has been output.

Update (in response to comment)

The output files are by default named part-x-yyyyy where:

x is either 'm' or 'r', depending on whether the job was a map only job, or reduce
yyyyy is the mapper or reducer task number (zero based)

So a job which has 32 reducers will have files named part-r-00000 to part-r-00031, one for each reducer task.

edited Apr 29 '13 at 02:50

answered May 19 '12 at 16:14

Chris White

29,949
4
71
93

That doesn't explain why the output file is called `part-r-00000`, though, or whether this is even necessarily always the case. – Kyle Strand Apr 29 '13 at 02:28
Updated to specifically address @KyleStrand comment – Chris White Apr 29 '13 at 02:50
3

Note that: currently (`hadoop-streaming-2.4.0.2.1.1.0`) there's no `x` if you happen to use hadoop-streaming. So it's gonna be like `part-00000`. – masu Sep 15 '14 at 14:41
Some result sets have _SUCCESS files, and some don't. For instance, when using SAVE from Pig, is there an option that one needs to set for _SUCCESS file to be created when done? – Igor Yagolnitser Jan 12 '17 at 23:37

What are SUCCESS and part-r-00000 files in hadoop

1 Answers1

Linked