I was doing some groupby parallel computation with dask using pyarrow to load parquet files from s3. However, the same piece of code may run or fail (with different error messages) with random chances. Same issue happened when using fastparquet:
File "pyarrow/_parquet.pyx", line 1036, in pyarrow._parquet.ParquetReader.open
File "pyarrow/error.pxi", line 80, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Arrow error: IOError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2309). Detail: Python exception: ssl.SSLError
or failing with different error:
File "pyarrow/_parquet.pyx", line 1036, in pyarrow._parquet.ParquetReader.open
File "pyarrow/error.pxi", line 80, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Arrow error: IOError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2309). Detail: Python exception: ssl.SSLError
The dask scheduler I was using is processes. It works fine with threads but will be extremely slow. Is this behavior expected for dask?