1

I want HDFS commands to fail if a parent directory doesn't exist when making subdirectories. When I use any of FileSystem#mkdirs, I find that an exception isn't risen, instead creating non-existent parent directories:

import java.util.UUID
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}

val conf = new Configuration()
conf.set("fs.defaultFS", s"hdfs://$host:$port")

val fileSystem = FileSystem.get(conf)
val cwd = fileSystem.getWorkingDirectory

// Guarantee non-existence by appending two UUIDs.
val dirToCreate = new Path(cwd, new Path(UUID.randomUUID.toString, UUID.randomUUID.toString))

fileSystem.mkdirs(dirToCreate)

Without the cumbersome burden of checking for the existence, how can I force HDFS to throw an exception if a parent directory doesn't exist?

erip
  • 16,374
  • 11
  • 66
  • 121
  • 1
    Not sure you can. Docs say it does `mkdir -p` equivalent https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#mkdirs-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.permission.FsPermission- – OneCricketeer Dec 17 '17 at 19:07
  • @cricket_007 Didn't know if there was another part of the API that does something nearly equivalent. – erip Dec 17 '17 at 19:08
  • 1
    Well, the CLI for `hdfs dfs -mkdir` doesn't make parent directories on its own... I would have to look at that code – OneCricketeer Dec 17 '17 at 19:13
  • 1
    Looks like it checks for existence. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Mkdir.java – OneCricketeer Dec 17 '17 at 19:17
  • @cricket_007 That's inconvenient. I've also seen [`FileContext`](http://hadoop.apache.org/docs/r2.8.2/api/org/apache/hadoop/fs/FileContext.html) which looks promising -- maybe this is what I should be using (as it has the `createParents` boolean flag). – erip Dec 17 '17 at 19:29
  • 1
    I knew I'd seen a boolean parameter somewhere for directory creation – OneCricketeer Dec 17 '17 at 19:31

1 Answers1

1

The FileSystem API does not support this type of behavior. Instead, FileContext#mkdir should be used; for example:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileContext, FileSystem, Path}
import org.apache.hadoop.fs.permission.FsPermission

val files = FileContext.getFileContext()
val cwd = files.getWorkingDirectory
val permissions = new FsPermission("644")
val createParent = false

// Guarantee non-existence by appending two UUIDs.
val dirToCreate = new Path(cwd, new Path(UUID.randomUUID.toString, UUID.randomUUID.toString))

files.mkdir(dirToCreate, permissions, createParent)

The above example will throw:

java.io.FileNotFoundException: Parent directory doesn't exist: /user/erip/f425a2c9-1007-487b-8488-d73d447c6f79
erip
  • 16,374
  • 11
  • 66
  • 121