I am using Azure Databricks, which I have hooked up to a data lake and I want to get metadata such as modified date for the files in the lake. I am able to do this within Databricks itself using os.stat()
as detailed in this answer, but I am developing locally using Databricks connect, and can't figure out how to test it locally given it only has context of my local file system.
Asked
Active
Viewed 214 times
0

Tim Hoare
- 3
- 1
-
You might wanna use Databricks-CLI to test your code locally. See this official documentation - https://docs.databricks.com/dev-tools/cli/index.html. Hopefully, that should solve your use case. – Dipanjan Mallick Mar 18 '22 at 18:33
-
What is the issue? You're still connected to the cluster and the data lake is still accessible through the /dbfs/... path – David דודו Markovitz Mar 18 '22 at 18:45
-
@DavidדודוMarkovitz I can run `os.stat("/dbfs...")` in databricks and it gives me what I want but locally the exact same thing gives me FileNotFoundError: [WinError 2] The system cannot find the file specified: '/dbfs...'. Changing the path in stat to a file that is on my local machine gives returns without error too – Tim Hoare Mar 18 '22 at 19:20
-
Can You Share a full path (you can mask some of the fields, I just want to see the pattern)? – David דודו Markovitz Mar 18 '22 at 19:39
-
1Sure. Something like this `os.stat("/dbfs/mnt/lake/raw/api/")` – Tim Hoare Mar 18 '22 at 19:46
-
Got it. I'll have to test it tomorrow – David דודו Markovitz Mar 18 '22 at 19:51
-
`os.stat` won't work with db-connect because it's executed on driver, and with db-connect, driver is the local machine, not databricks – Alex Ott Apr 03 '22 at 17:12