0

I have the following error on my notebook after setting up and EMR 6.3.0:

An error was encountered:
Install s3fs to access S3
Traceback (most recent call last):
  File "/usr/local/lib64/python3.7/site-packages/pandas/io/parquet.py", line 460, in read_parquet
    path, columns=columns, use_nullable_dtypes=use_nullable_dtypes, **kwargs
  File "/usr/local/lib64/python3.7/site-packages/pandas/io/parquet.py", line 218, in read
    mode="rb",
  File "/usr/local/lib64/python3.7/site-packages/pandas/io/parquet.py", line 67, in _get_path_or_handle
    path_or_handle, **(storage_options or {})
  File "/usr/local/lib/python3.7/site-packages/fsspec/core.py", line 353, in url_to_fs
    chain = _un_chain(url, kwargs)
  File "/usr/local/lib/python3.7/site-packages/fsspec/core.py", line 315, in _un_chain
    cls = get_filesystem_class(protocol)
  File "/usr/local/lib/python3.7/site-packages/fsspec/registry.py", line 213, in get_filesystem_class
    raise ImportError(bit["err"]) from e
ImportError: Install s3fs to access S3

The EMR is setup with JupyterHub 1.2.0, TensorFlow 2.4.1, Spark 3.1.1 and I ran the following bootstrap:

#!/bin/bash
sudo python3 -m pip install -U setuptools
sudo python3 -m pip install -U pip
sudo python3 -m pip install wheel
sudo python3 -m pip install pillow
sudo python3 -m pip install pandas==1.2.5
sudo python3 -m pip install pyarrow
sudo python3 -m pip install boto3
sudo python3 -m pip install s3fs
sudo python3 -m pip install fsspec

THe notebook is on an S3 bucket: https://p8-data-001.s3.eu-west-3.amazonaws.com/jupyter/jovyan/P8_Notebook_Linux_EMR_PySpark_V1.0.ipynb

The error happens after #4.10.6. From what i have read, it seems I need to downgrade boto3 but if i do so, I have an error with botocore version compatibility. Does anyone knows how I should set up my bootstrap ?

I would expect the read_parquet to be working fine as s3fs is installed according to my log file.

Andrew Gaul
  • 2,296
  • 1
  • 12
  • 19
Airone
  • 1
  • 1

1 Answers1

0

So i changed the line:

sudo python3 -m pip install boto3

sudo python3 -m pip install s3fs

by sudo python3 -m pip install boto3 s3fs and it worked.

Airone
  • 1
  • 1