AWS Glue- s3fs throws FileNotFoundError

Asked Dec 13 '22 at 16:25

Active Dec 13 '22 at 22:42

Viewed 135 times

I am trying to read a parquet file from s3 using s3fs file system using pyarrow but getting NoSuchKey or FileNotFoundError.

def read_parquet_pd(path):
    s3 = s3fs.S3FileSystem()
    path = path.rstrip('/')
    logger.info(f"Path is: {path}")
    df = pq.ParquetDataset(f"{path}/", filesystem=s3).read_pandas().to_pandas()
    return df

my s3 path looks like this- s3://bucket_name/folder/ if I remove path.rstrip('/') from my code, it give me the error- s3://bucket_name/finance_outbound/folder//xyz.parquet does not exist. If I keep path.rstrip('/') it gives me the error- NoSucKey: s3://hvcp-sit-opdata-finance-s3://bucket_name/finance_outbound/folder I am not sure from where the extra slash adding up or getting removed it respective cases. Any help will be extremely helpful.

edited Dec 13 '22 at 22:42

Andrew Gaul

2,296
1
12
19

asked Dec 13 '22 at 16:25

Mradul Yd

AWS Glue- s3fs throws FileNotFoundError

0 Answers0