I have multiple files in my folder , i want to pattern match if any file is present , if that file is present then store the variable with whole file path.
how to achieve this in pyspark
I have multiple files in my folder , i want to pattern match if any file is present , if that file is present then store the variable with whole file path.
how to achieve this in pyspark
Since you want to store the whole path in a variable, you can achieve this with a combination of dbutils
and Regular expression pattern matching.
dbutils.fs.ls(path)
to return the list of files present in a folder (storage account or DBFS). Assign its return value to a variable called files
.#my sample path- mounted storage account folder.
files = dbutils.fs.ls("/mnt/repro")
re.match()
you can check if the current item's file name matches your pattern. If it matches, append its path to your result variable (list).from re import match
matched_files=[]
for file in files:
#print(file)
if(match("sample.*csv", file.name)): #"sample.*csv" is pattern to be matched
matched_files.append(file.path)
#print("Matched files: ",matched_files)
Sample output: