I am trying to read a parquet file from s3 using s3fs file system using pyarrow but getting NoSuchKey or FileNotFoundError.
def read_parquet_pd(path):
s3 = s3fs.S3FileSystem()
path = path.rstrip('/')
logger.info(f"Path is: {path}")
df = pq.ParquetDataset(f"{path}/", filesystem=s3).read_pandas().to_pandas()
return df
my s3 path looks like this- s3://bucket_name/folder/
if I remove path.rstrip('/')
from my code, it give me the error-
s3://bucket_name/finance_outbound/folder//xyz.parquet does not exist.
If I keep path.rstrip('/')
it gives me the error-
NoSucKey: s3://hvcp-sit-opdata-finance-s3://bucket_name/finance_outbound/folder
I am not sure from where the extra slash adding up or getting removed it respective cases. Any help will be extremely helpful.