1

I have a pipeline running in Azure Synpase and I need to execute a pyspark code that create a current date folder. The structure must be "2021/12/10" (this is the lastest data that my pipeline was executed.. one folder for year, month and day).

path= 'dataupdated/yyyy/MM/dd' .. i just need to automate the creation of these folders

I think i have to use "get datetime"..

2 Answers2

1

You can use the os library to do that.

import os
from datetime import datetime as dt
filename = f"{dt.now().strftime('%Y')}/{dt.now().strftime('%m')}/{dt.now().strftime('%d')}/file.extension" 
#This makedirs below will create directories if not found
os.makedirs(os.path.dirname(filename), exist_ok=True)
with open(filename, "w") as f:
    f.write("test")
1

Creation of folder with the folder structure as : yyyy/MM/dd is not possible in ADLS as it doesn't accept forward slash as the folder name. Please check the below image.

enter image description here

Instead you can use the below code to create folder with format: yyyy-MM-dd

from pyspark.sql.functions import *
from datetime import datetime
var=datetime.utcnow()
mssparkutils.fs.mkdirs("/dataupdated/"+var.strftime('%Y-%m-%d'))

If we try to create folder using

mssparkutils.fs.mkdirs("/dataupdated/"+var.strftime('%Y/%m/%d'))

It will treat the next string after forward slash as subfolder.

AnnuKumari
  • 443
  • 1
  • 5