Questions tagged [python-s3fs]

For questions related to the Python s3fs library

Use this tag for questions related to the Python s3fs library.

Not to be confused with the tag, which is for mounting an s3fs bucket on a local mount point and has nothing to do with Python.

85 questions
1
vote
2 answers

How to stream a large gzipped .tsv file from s3, process it, and write back to a new file on s3?

I have a large file s3://my-bucket/in.tsv.gz that I would like to load and process, write back its processed version to an s3 output file s3://my-bucket/out.tsv.gz. How do I streamline the in.tsv.gz directly from s3 without loading all the file to…
0x90
  • 39,472
  • 36
  • 165
  • 245
1
vote
1 answer

Can you use xr.open_mfdataset when reading files from S3 via s3fs?

I'm trying to read multiple netcdf files at once using xr.open_mfdataset from a S3 bucket, using s3fs. Is this possible? Tried the below, which works for xr.open_dataset for a single file, but doesn't work for multiple files: import s3fs import…
zbruick
  • 316
  • 1
  • 9
1
vote
1 answer

Writing Images to s3fs.S3FileSystem after preprocessing image

Am currently accessing a s3 bucket from my school system. To connect, I used the following: import s3fs from skimage import exposure from PIL import Image, ImageStat s3 = s3fs.S3FileSystem(client_kwargs={'endpoint_url': 'XXX'}, …
Andy
  • 191
  • 10
1
vote
1 answer

Cannot read parquet files in s3 bucket with Pyspark 2.4.4

I am using Pyspark 2.4.4. I want to load into a spark dataframe some parquet files that are in a s3 bucket and I want to read all these files at once. I have been looking how to do it in these links: How to read parquet data from S3 to spark…
J.C Guzman
  • 1,192
  • 3
  • 16
  • 40
1
vote
1 answer

Do I need to check integrity use pandas to upload and download file from s3?

I use pandas to upload and download file from s3 in the following style (pandas use s3fs in the background) import pandas as pd pd.read_csv("s3://bucket/path/to/file.csv") If the file is large, it is usually a concern that download (or upload) is…
Hello lad
  • 17,344
  • 46
  • 127
  • 200
1
vote
1 answer

Pandas pd.read_csv(s3_path) fails with "TypeError: 'coroutine' object is not subscriptable"

I am running a spark application in Amazon EMR Cluster and since a few days ago, I am getting the following error whenever I try reading a file from S3 using pandas. I have added bootstrap actions to install pandas, fsspec and s3fs. Code: import…
Inderpartap Cheema
  • 463
  • 1
  • 7
  • 17
1
vote
1 answer

Transfer the data from S3 to FTP server via stream using Python

Using Python, I want to copy files that match a pattern sample1 from AWS S3 to FTP server directly without any downloads to local temporary location. I attempted the following: import s3fs from ftplib import FTP_TLS s3 =…
BU123
  • 13
  • 4
1
vote
2 answers

Attempting to Cache s3 files

I have two pipelines that I run. The first pipeline reads files from s3 does some processing and updates the files. The second pipeline runs multiple jobs and for each job i download files from s3 and produce some output. I feel i am wasting a lot…
Dinero
  • 1,070
  • 2
  • 19
  • 44
1
vote
2 answers

s3fs timeout issue on an AWS Lambda function within a VPN

s3fs seems to fail from time to time when reading from an S3 bucket using an AWS Lambda function within a VPN. I am using s3fs==0.4.0 and pandas==1.0.1. import s3fs import pandas as pd def lambda_handler(event, context): bucket =…
Ander
  • 5,093
  • 7
  • 41
  • 70
1
vote
1 answer

Python AWS S3FS API: Manually set proxy server

I can't set proxy server for S3FS Python API. As S3FS's Config class is imported from botocore there is no S3FS documentation about it. So, I have read this question and also botocore documentation. However, I couldn't manage to get botocore Config…
MarlosB
  • 185
  • 2
  • 11
0
votes
0 answers

Disable ssl validation while connecting to s3 using pyarrow fs library/ s3fs library

I am using pyarrow fs.S3FileSystem library to write a csv to s3 bucket. Although this code runs fine in my local when I deploy to VM (linux) it throws error: OSError: When listing objects under key xx in bucket xx: AWS Error NETWORK_CONNECTION…
0
votes
1 answer

how to copy s3 object from one bucket to another using python s3fs

Using python s3fs, how do you copy an object from one s3 bucket to another? I have found answers using boto3, but could not find anything when looking through the s3fs docs.
jjbskir
  • 8,474
  • 9
  • 40
  • 53
0
votes
1 answer

ImportError: Install s3fs to access S3 on amazon EMR 6.3.0

I have the following error on my notebook after setting up and EMR 6.3.0: An error was encountered: Install s3fs to access S3 Traceback (most recent call last): File "/usr/local/lib64/python3.7/site-packages/pandas/io/parquet.py", line 460, in…
0
votes
1 answer

How to read a file from s3 using s3fs

I have the following method in Python: def read_file(self, bucket, table_name, file_name, format="csv"): data = None read_from_path = f"s3://{bucket}/{table_name}/{file_name}" try: fs = s3fs.S3FileSystem( …
HuLu ViCa
  • 5,077
  • 10
  • 43
  • 93
0
votes
0 answers

s3fs.put into empty and non-empty S3 folder

I am copying folder to S3 with s3fs.put(..., recursive=True) and I experience weird behavior. The code is: import s3fs source_path = 'foo/bar' # there are some inside target_path = 'S3://my_bucket/baz' s3…
Pepacz
  • 881
  • 3
  • 9
  • 24