Questions tagged [python-s3fs]

For questions related to the Python s3fs library

Use this tag for questions related to the Python s3fs library.

Not to be confused with the tag, which is for mounting an s3fs bucket on a local mount point and has nothing to do with Python.

85 questions
4
votes
0 answers

How to pass arguments to ls command over an s3fs connection?

I have a file transfer utility set up in Python using s3fs where I populate a list of files to download off of AWS by using the ls command to fill the list. I'm interested in trying to create another list containing only directories in that…
4
votes
3 answers

How to mount S3 bucket as local FileSystem?

I have a python app running on a Jupiter-notebook on AWS. I loaded a C-library into my python code which expects a path to a file. I would like to access this file from the S3 bucket. I tried to use s3fs: s3 = s3fs.S3FileSystem(anon=False) using…
Khan
  • 1,418
  • 1
  • 25
  • 49
3
votes
1 answer

Resolving dependencies fails on boto3 and s3fs using poetry

I can install boto3, s3fs and pandas using : pip install boto3 pandas s3fs But it fails with poetry : poetry add boto3 pandas s3fs Here is the error : Because no versions of s3fs match >2023.3.0,<2024.0.0 and s3fs (2023.3.0) depends on…
jtobelem
  • 749
  • 2
  • 6
  • 23
3
votes
0 answers

s3fs timeout on big S3 files

This is similar to dask read_csv timeout on Amazon s3 with big files, but that didn't actually resolve my question. import s3fs fs = s3fs.S3FileSystem() fs.connect_timeout = 18000 fs.read_timeout = 18000 # five…
3
votes
1 answer

Downloading S3 files in Google Colab

I am working on a project and it happens that some data is provided in form of S3fileSystem. I can read that data using S3FileSystem.open(path). But there are more than 360 files and it takes atleast 3 minutes to read a single file. I was wondering,…
MK_07
  • 35
  • 7
3
votes
2 answers

NotImplementedError: Text mode not supported, use mode='wb' and manage bytes in s3fs

I know that there are a similar question but it is more general and not specific of this package. I am saving a pandas dataframe within a Sagemaker Jupyter notebook into a csv in S3 as follow: df.to_csv('s3://bucket/key/file.csv',…
3
votes
3 answers

Log parquet filenames created by pyarrow on S3

We are appending data to an existing parquet dataset stored in S3 (partitioned) by using pyarrow. This runs on AWS lambda several times per hour. A minimal example would be: import pyarrow as pa import pyarrow.parquet as pq import s3fs df = ... #…
jarias
  • 150
  • 2
  • 12
2
votes
2 answers

How to access my own fake bucket with S3FileSystem, Pytest and Moto

I'm trying to implement Unit Tests using Pytest, Moto (4.1.6) and s3fs (0.4.2) for my functions that interact with S3. So far I am able to create a bucket and populate it with all the files that live in the data folder. Unfortunately one of my…
A Campos
  • 753
  • 3
  • 10
  • 31
2
votes
1 answer

Read timeout in pd.read_parquet from S3, and understanding configs

I'm trying to simplify access to datasets in various file formats (csv, pickle, feather, partitioned parquet, ...) stored as S3 objects. Since some users I support have different environments with limited options for upgrading (big company, don't…
Wassadamo
  • 1,176
  • 12
  • 32
2
votes
1 answer

s3fs local filecache of versioned flies

I want to use s3fs based on fsspec to access files on S3. Mainly because of 2 neat features: local caching of files to disk with checking if files change, i.e. a file gets redownloaded if the local and remote file differ file version id support for…
2
votes
1 answer

glue python shell job failure to import s3fs

I am new to Glue job and followed the way to configure whl file as per the below link Import failure of s3fs library in AWS Glue I am getting the following error for the AWS Glue Python - 3 job WARNING: Retrying (Retry(total=0, connect=None,…
Phani
  • 803
  • 8
  • 11
2
votes
1 answer

Does s3fs.S3FileSystem() always need a specific region setting?

What I'm trying to do is to connect a s3 bucket from my EC2 machine. This error comes up if I don't set the endpoint_url in s3fs.S3FileSystem(). Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/s3fs/core.py", line…
Trey Yi
  • 69
  • 6
2
votes
0 answers

AWS Sagemaker notebook intermittent 'Unable to locate credentials'

I'm trying to use Dask to get multiple files (JSON) from AWS S3 into memory in a Sagemaker Jupyter Notebook. When I submit 10 or 20 workers, everything runs smoothly. However, when I submit 100 workers, between 30% and 50% of them encounter the…
2
votes
1 answer

Why do I get ConnectionResetError when reading and writing from and to s3 using smart_open?

The following code can read and write back to s3 on the fly following the the discussion on here: from smart_open import open import os bucket_dir = "s3://my-bucket/annotations/" with open(os.path.join(bucket_dir, "in.tsv.gz"), "rb") as fin: …
0x90
  • 39,472
  • 36
  • 165
  • 245
2
votes
4 answers

How to get a list of all distinct prefixes in S3 bucket?

If I have a directory structure as below and the prefix is /folder1, /folder1/folder11/folder12/folder13/*.files /folder21/folder22/folder23/*.files /folder31/folder32/*.files I want to loop through these directories…