3

This is similar to dask read_csv timeout on Amazon s3 with big files, but that didn't actually resolve my question.

import s3fs
fs = s3fs.S3FileSystem()

fs.connect_timeout = 18000
fs.read_timeout = 18000 # five hours

fs.download('s3://bucket/big_file','local_path_to_file')

The error I then get is

Traceback (most recent call last):
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/aiobotocore/response.py", line 50, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/aiohttp/streams.py", line 380, in read
    await self._wait("read")
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/aiohttp/streams.py", line 306, in _wait
    await waiter
aiohttp.client_exceptions.ServerTimeoutError: Timeout on reading data from socket
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/fsspec/spec.py", line 1113, in download
    return self.get(rpath, lpath, recursive=recursive, **kwargs)
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/fsspec/asyn.py", line 281, in get
    return sync(self.loop, self._get, rpaths, lpaths)
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/fsspec/asyn.py", line 71, in sync
    raise exc.with_traceback(tb)
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in f
    result[0] = await future
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/fsspec/asyn.py", line 266, in _get
    return await asyncio.gather(
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/s3fs/core.py", line 701, in _get_file
    chunk = await body.read(2**16)
  File "/Users/christopherturnbull/PointTopic/PointTopic/lib/python3.9/site-packages/aiobotocore/response.py", line 52, in read
    raise AioReadTimeoutError(endpoint_url=self.__wrapped__.url,
aiobotocore.response.AioReadTimeoutError: Read timeout on endpoint URL: "https://ptpiskiss.s3.eu-west-1.amazonaws.com/REBUILD%20FOR%20TIME%20SERIES/v30a%20sept%202019.accdb"

Which is strange, because I thought I was setting the appropriate timeouts on the worker copy of the class. It's solely due to my bad internet connection, but is there something I need to do on my s3 end to assist here?

Pika Supports Ukraine
  • 3,612
  • 10
  • 26
  • 42
  • **Side-note:** Amazon S3 is a block storage system, not a filesystem. `s3fs` simulates a filesystem but simply calls normal S3 API calls in the back-end. It would be much more reliable if your program made S3 API calls directly. You can do this in Python by using the [boto3 library](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html). – John Rotenstein Dec 15 '20 at 10:47
  • Just to sort things out: How big is the file when it stalls? (maybe it's a MTU problem). Does it stop growing after an specific time ? (firewall session timeout) have you tried disabling Nagle algorithm, and tuning the TCP stack (recv window size, etc.) ? – Iñigo González Dec 15 '20 at 11:40

0 Answers0