1

I m trying to get creationfile metadata.

File is in: Azure Storage
Accesing data throw: Databricks

right now I m using:

   file_path = my_storage_path
   dbutils.fs.ls(file_path)

but it returns

[FileInfo(path='path_myFile.csv', name='fileName.csv', size=437940)]

I do not have any information about creation time, there is a way to get that information ?

other solutions in Stackoverflow are refering to files that are already in databricks Does databricks dbfs support file metadata such as file/folder create date or modified date in my case we access to the data from Databricks but the data are in Azure Storage.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Enrique Benito Casado
  • 1,914
  • 1
  • 20
  • 40

1 Answers1

2

It really depends on the version of Databricks Runtime (DBR) that you're using. For example, modification timestamp is available if you use DBR 10.2 (didn't test with 10.0/10.1, but definitely not available on 9.1):

enter image description here

If you need to get that information you can use Hadoop FileSystem API via Py4j gateway, like this:

URI           = sc._gateway.jvm.java.net.URI
Path          = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem    = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
Configuration = sc._gateway.jvm.org.apache.hadoop.conf.Configuration

fs = FileSystem.get(URI("/tmp"), Configuration())

status = fs.listStatus(Path('/tmp/'))
for fileStatus in status:
    print(f"path={fileStatus.getPath()}, size={fileStatus.getLen()}, mod_time={fileStatus.getModificationTime()}")
Alex Ott
  • 80,552
  • 8
  • 87
  • 132