I have a databricks streaming job which used autoloader for File Discovery but the problem is its unable to list the files according to the Glob pattern I have provided
Right now the Raw zone of our files contain data from 24th March 2023 till today but my intention is to filter out the files for only the last week
"Sources": {
"csv_source": {
"Path": "/mnt/raw-staging/n4-windows-eventlog/exp/day=20230410/materialnum=*/serialnum=*/",
"ReadOptions": {
"useStrictGlobber": "true",
"header": "true",
"sep": ";",
"cloudFiles.partitionColumns": "day,materialnum,serialnum"
}
}
}
In this example I have used different glob patterns as follows:
/mnt/raw-staging/n4-windows-eventlog/exp/day=202304{24..30}/materialnum=*/serialnum=*/
/mnt/raw-staging/n4-windows-eventlog/exp/day=202304{24,25,26}/materialnum=*/serialnum=*/
/mnt/raw-staging/n4-windows-eventlog/exp/day=[20230424][20230425]/materialnum=*/serialnum=*/
/mnt/raw-staging/n4-windows-eventlog/exp/day={[202306]*,[202305]*,[2023043]*,[2023042][4-9]}/materialnum=*/serialnum=*/"
and none of them work, it only works if I mention the exact date, due to which the autoloader has to be configured everytime for a separate date manually.
Is there a correct glob pattern that allows me to fetch the list of files from 20230424 to 20230430 something like this.
These glob patterns working when running through a shell script but not using the autoloader.