I am trying to read parquet files using spark, if I want to read the data for June, I'll do the following:
"gs://bucket/Data/year=2021/month=6/file.parquet"
if I want to read the data for all the months, I'll do the following:
"gs://bucket/Data/year=2021/month=6/file.parquet"
if I want to read the first two days of May:
"gs://bucket/Data/year=2021/month=5/day={1,2}file.parquet"
if I want to read November and December:
"gs://bucket/Data/year=2021/month={11,12}/file.parquet"
you get the idea... but what if I have a dictionary of month, days key, value pairs..
for example {1: [1,2,3], 4: [10,11,12,13]}
--> which means that I need to read the days [1,2,3]
from January
, and the days [10,11,12,13]
from April
. how would I reflect that as a wildcard to the path.
Thank you