How to print the date of a file in Scala is explained here.
My question is how I can get a variable containing this information which can be returned as a column to a dataframe. None of the conversions I would expect to be allowed, actually are allowed.
My code (using Scala 2.11):
import org.apache.spark.sql.functions._
import java.nio.file.{Files, Paths} // Needed for file time
import java.nio.file.attribute.BasicFileAttributes
import java.util.Date
def GetFileTimeFunc(pathStr: String): String = {
// From: https://stackoverflow.com/questions/47453193/how-to-get-creation-date-of-a-file-using-scala
val FileTime = Files.readAttributes(Paths.get(pathStr), classOf[BasicFileAttributes]).creationTime;
val JavaDate = Date.from(FileTime.toInstant);
return(JavaDate.toString())
}
@transient val GetFileTime = udf(GetFileTimeFunc _)
val filePath = "dbfs:/mnt/myData/" // location of data
val file_df = dbutils.fs.ls(filePath).toDF // Output columns are $"path", $"name", and $"size"
.withColumn("FileTimeCreated", GetFileTime($"path"))
display(file_df)//.select("name", "size"))
Output:
SparkException: Failed to execute user defined function($anonfun$2: (string) => string)
For some reason, Instant is not allowed as a column type, so I cannot use it as a return type.The same for FileTime, JavDate, etc.