Not all filesystems provide atomic rename, some of Hadoop compatible filesystem implement rename operation as non-atomic cp
+ rm
and eventually consistent and it creates complications when working with such filesystem.
GCS rename is not atomic:
Unlike the case with many file systems, the gsutil mv
command does not perform a single atomic operation. Rather, it performs a copy from source to destination followed by removing the source for each object.
Rename in S3 is not atomic and not immediately consistent:
Read Introduction to S3Guard
When renaming directories, the listing may be incomplete or out of date, so the rename operation loses files. This is very dangerous as MapReduce, Hive, Spark and Tez all rely on rename to commit the output of workers to the final output of the job.
HDFS provides atomic and consistent delete and rename but other Hadoop compatible filesystems may not completely support it.
Read this Apache Hadoop requirements of a Hadoop compatible filesystem
In the Atomicity section it is stated that rename file or directory MUST be atomic, but at the same time in the very beginning in Introduction you can read this:
The behaviour of other Hadoop filesystems are not as rigorously tested. The bundled S3 FileSystem makes Amazon’s S3 Object Store (“blobstore”) accessible through the FileSystem API. The Swift FileSystem driver provides similar functionality for the OpenStack Swift blobstore. The Azure object storage FileSystem in branch-1-win talks to Microsoft’s Azure equivalent. All of these bind to object stores, which do have different behaviors, especially regarding consistency guarantees, and atomicity of operations.
GCS, S3 and some other Hadoop-compatible filesystem do not provide atomicity for renames and it causing issues with Hive or Spark, though these issues more or less successfully can be fixed using other tools or technics like using S3Guard or creating new partition location each time based on timestamp/runId when rewriting partition and rely on atomic partition mount in Hive, etc, etc
Real world is not ideal.
Mappers in Hadoop Mapreduce initially meant to be run if possible on data nodes where the data sits to speed-up processing, but companies like Amazon are selling computation clusters and storage separately. You can shutdown or resize one cluster, start another one and access the same data in S3, the data and computation are completely separated.