Questions tagged [python-s3fs]

For questions related to the Python s3fs library

Use this tag for questions related to the Python s3fs library.

Not to be confused with the tag, which is for mounting an s3fs bucket on a local mount point and has nothing to do with Python.

85 questions
2
votes
0 answers

How to solve "No module named _bz2" error?

I'm trying to use s3fs in Python 3.6, using Debian 3.16.51-3. When I import s3fs: import s3fs ... from _bz2 import BZ2Compressor, BZ2Decompressor ModuleNotFoundError: No module named '_bz2' Alright, I tried to update/install libbz2-dev, as…
2
votes
0 answers

Can I use the python s3fs library over aiobotocore?

s3fs is a convenient Python filesystem-like interface for S3, built on top of botocore. To access S3 using asyncio, aiobotocore is an alternative to botocore. Is it possible to use s3fs with asyncio/aiobotocore rather than vanilla botocore? My use…
gerrit
  • 24,025
  • 17
  • 97
  • 170
1
vote
0 answers

S3F3 framework with sso credentials

I'm exploring the S3FS framework which I need for reading/writing from/to the S3 file system. From what I can see in docs, we can pass the AWS credentials explicitly, but I don't see any information about SSO credentials. I also tested this snippet…
Monica
  • 1,030
  • 3
  • 17
  • 37
1
vote
0 answers

Using the `s3fs` python library with Task IAM role credentials on AWS Batch

I'm trying to get an ML job to run on AWS Batch. The job runs in a docker container, using credentials generated for a Task IAM Role. I use DVC to manage the large data files needed for the task, which are hosted in an S3 repository. However, when…
1
vote
1 answer

How to connect python s3fs client to a running Minio docker container?

For test purposes, I'm trying to connect a module that intoduces an absration layer over s3fs with custom business logic. It seems like I have trouble connecting the s3fs client to the Minio container. Here's how I created the the container and…
KaizenCat
  • 11
  • 2
1
vote
1 answer

xarray I/O operation on closed file

I am opening and using a netcdf file that is located on s3. I have the following code, however it creates an exception. import s3fs import xarray as xr filepath = "s3://mybucket/myfile.nc" fs = s3fs.S3FileSystem() with fs.open(filepath) as…
Scott
  • 85
  • 6
1
vote
1 answer

Pandas 1.4.2 gives errors for installing s3fs while reading csv from S3 bucket

I am experiecing issue with pandas latest release 1.4.2 while reading csv file from S3. I am using AWS Lambda python runtime environment using python 3.8, that comes with following boto3 and botocore versions. Boto3 - 1.20.32 Botocore - 1.23.32 And,…
1
vote
1 answer

S3 to Pandas with local variable authentication

I'm downloading a file (to be precise a parquet set of files) from S3 and converting that to a Pandas DataFrame. I'm doing that with the Pandas function read_parquet and s3fs, as described here: df =…
gsmafra
  • 2,434
  • 18
  • 26
1
vote
2 answers

What is the working combination of the s3fs and fsspec version? ImportError: cannot import name 'maybe_sync' from 'fsspec.asyn'

I am using the latest version of s3fs-0.5.2 and fsspec-0.9.0, when import s3fs, encountered the following error: File "/User/.conda/envs/py376/lib/python3.7/site-packages/s3fs/__init__.py", line 1, in from .core import S3FileSystem, S3File …
xsqian
  • 199
  • 5
  • 13
1
vote
2 answers

Snowflake is not able to download file from S3 without access key, while s3fs is able to download that file from S3

I have a S3 URL to a public file similar to the following URL example: s3://test-public/new/solution/file.csv (this is not the actual link . just a close example to the one i'm using) I am able to read the file using s3fs module in a python…
1
vote
1 answer

How to write a numpy array as a csv to S3

I have a numpy ndarray with 2 columns that looks like below [[1.8238497e+03 5.2642276e-06] [2.7092224e+03 6.7980350e-06] [2.3406370e+03 6.6842499e-06] ... [1.7234612e+03 6.6842499e-06] [2.1071147e+03 2.1332115e-05] [2.6937273e+03…
nad
  • 2,640
  • 11
  • 55
  • 96
1
vote
0 answers

h5py slow when reading through an s3fs file object

I am using the following combination of h5py and s3fs to read a couple of small datasets from larger HDF5 files on Amazon S3. s3 = s3fs.S3FileSystem() h5_file = h5py.File(s3.open(s3_path,'rb'), 'r') data = h5_file.get(dataset) These reads are…
1
vote
1 answer

Zarr: improve xarray writing performance to S3

Writing xarray datasets to AWS S3 takes a surprisingly big amount of time, even when no data is actually written with compute=False. Here's an example: import fsspec import xarray as xr x = xr.tutorial.open_dataset("rasm") target =…
Val
  • 6,585
  • 5
  • 22
  • 52
1
vote
1 answer

s3fs/botocore import error: InvalidIMDSEndpointError

I was trying to run some python code in docker and export a .csv file to S3, but got the same error as in aiobotocore - ImportError: cannot import name 'InvalidIMDSEndpointError' (asking here because I don't have enough reputation to comment under…
Joe
  • 393
  • 1
  • 2
  • 11
1
vote
0 answers

programmatically deleting parquet partitions from S3 bucket using pyspark

I have a parquet file partitioned in the S3 file system (s3fs) like so: STATE='DORMANT' -----> DATE=2020-01-01 -----> DATE=2020-01-02 .... -----> DATE=2020-11-01 STATE='ACTIVE' -----> DATE=2020-01-01 -----> DATE=2020-01-02 …
thentangler
  • 1,048
  • 2
  • 12
  • 38