Want to get the last updated datetime for the file in datalake using Synapse notebook (pyspark). Do we have any out of the box options?
Asked
Active
Viewed 1,073 times
0
-
I am using mssparkutils.fs.ls to get the list of files in the given location. Any leads on getting this list sorted by file modified date will be helpful. – Sethuramalingam Sundaram Aug 02 '21 at 21:15
-
are you restricted to use Synapse notebook? same can be done using Azure data factory if it's fine for you. – Utkarsh Pal Aug 12 '21 at 06:44
1 Answers
-1
Can't you just assign the output to a dataframe and sort that?
df = mssparkutils.fs.ls(path)
sorted(df(sortcolumn))
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.sort.html
Maybe something like this? How do you get a directory listing sorted by creation date in python?

Le Poissons
- 39
- 1
- 4