0

I am trying to read a parquet file from s3 using s3fs file system using pyarrow but getting NoSuchKey or FileNotFoundError.

def read_parquet_pd(path):
    s3 = s3fs.S3FileSystem()
    path = path.rstrip('/')
    logger.info(f"Path is: {path}")
    df = pq.ParquetDataset(f"{path}/", filesystem=s3).read_pandas().to_pandas()
    return df

my s3 path looks like this- s3://bucket_name/folder/ if I remove path.rstrip('/') from my code, it give me the error- s3://bucket_name/finance_outbound/folder//xyz.parquet does not exist. If I keep path.rstrip('/') it gives me the error- NoSucKey: s3://hvcp-sit-opdata-finance-s3://bucket_name/finance_outbound/folder I am not sure from where the extra slash adding up or getting removed it respective cases. Any help will be extremely helpful.

Andrew Gaul
  • 2,296
  • 1
  • 12
  • 19
Mradul Yd
  • 71
  • 8

0 Answers0