This issue is tricky and hard to explain clearly, because it is not quite deterministic, and the code can't be entirely listed. There are a couple variations, some works and others don't, and I can't see the difference in the code that has bearings on the issue.
I have 1000 Parquet files in Google Cloud Storage, each file about 17Mb in size. I loop through the blobs, initiate a ParquetFile object on each, and print out a little info. After a few hundred files, it fails with this:
File "/usr/local/lib/python3.10/dist-packages/pyarrow/parquet/core.py", line 319, in __init__
source = filesystem.open_input_file(source)
File "pyarrow/_fs.pyx", line 770, in pyarrow._fs.FileSystem.open_input_file
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 138, in pyarrow.lib.check_status
pyarrow.lib.ArrowException: Unknown error: google::cloud::Status(UNKNOWN: Permanent error GetObjectMetadata: WaitForHandles(): unexpected error code in curl_multi_*, [12]=Unrecoverable error in select/poll)
My code creating each ParquetFile is roughly this:
path = "gs://..."
fs = GcsFileSystem(access_token='...', credential_token_expiration=...)
file = ParquetFile(path, filesystem=fs)
Any pointers are welcome! Thanks!!