I am attempting to create a Data Asset using the GreatExpectations library to point to all the files in subfolders under a parent folder. Here is a sample code snippet:
asset_name = "iceberg_asset"
s3_prefix = "folder_a/folder_b/folder_c/"
batching_regex = r"subfolder_a\/file\.parquet"
data_asset = datasource.add_parquet_asset(name=asset_name, batching_regex=batching_regex, s3_prefix=s3_prefix)
The batching_regex is supposed to capture all files with a specific full path, which includes the parent folder and file name. However, the current code is not working and returning an error message "file not found." I have confirmed that the regex is working fine.
Currently, only the regexp that matches the files under the s3_prefix is working. Does anyone have any suggestions to get this working for folders and files that match the regexp?