0

I am only able to gain limited/top-level access to my aws s3. I can see the buckets, but not their contents; neither subfolders nor files. I'm running everything from inside a conda environment. I've tried accessing files in private and public buckets without success. What am I doing wrong?

This block of code works as expected

    >>> import s3fs 
    >>> AKEY = 'XXXX'
    >>> SKEY = 'XXXX'
    >>> fs = s3fs.S3FileSystem(key=AKEY,secret=SKEY) 
    >>> fs.ls('s3://')  

    ['my-bucket-1',  
     'my-bucket-2',  
     'my-bucket-3']

This block doesn't

    >>> fs.ls('s3://my-bucket-1') 
    
    []

what I expect

    >>> fs.ls('s3://my-bucket-1') 
    
    ['my-bucket-1/test.txt',
     'my-bucket-1/test.csv']

When I try to open a file I get a FileNotFoundError

    import pandas as pd
    pd.read_csv(
        's3://my-bucket-1/test.csv',
        storage_options={'key':AKEY,'secret':SKEY}
    )
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[8], line 2
      1 import pandas as pd
----> 2 pd.read_csv(
      3     's3://my-bucket-1/test.csv'',
      4     storage_options={'key':AKEY,'secret':SKEY}
      5 )

File ~\anaconda3\envs\env-2\lib\site-packages\pandas\util\_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File ~\anaconda3\envs\env-2\lib\site-packages\pandas\util\_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File ~\anaconda3\envs\env-2\lib\site-packages\pandas\io\parsers\readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    946     defaults={"delimiter": ","},
    947 )
    948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)

File ~\anaconda3\envs\env-2\lib\site-packages\pandas\io\parsers\readers.py:605, in _read(filepath_or_buffer, kwds)
    602 _validate_names(kwds.get("names", None))
    604 # Create the parser.
--> 605 parser = TextFileReader(filepath_or_buffer, **kwds)
    607 if chunksize or iterator:
    608     return parser

File ~\anaconda3\envs\env-2\lib\site-packages\pandas\io\parsers\readers.py:1442, in TextFileReader.__init__(self, f, engine, **kwds)
   1439     self.options["has_index_names"] = kwds["has_index_names"]
   1441 self.handles: IOHandles | None = None
-> 1442 self._engine = self._make_engine(f, self.engine)

File ~\anaconda3\envs\env-2\lib\site-packages\pandas\io\parsers\readers.py:1735, in TextFileReader._make_engine(self, f, engine)
   1733     if "b" not in mode:
   1734         mode += "b"
-> 1735 self.handles = get_handle(
   1736     f,
   1737     mode,
   1738     encoding=self.options.get("encoding", None),
   1739     compression=self.options.get("compression", None),
   1740     memory_map=self.options.get("memory_map", False),
   1741     is_text=is_text,
   1742     errors=self.options.get("encoding_errors", "strict"),
   1743     storage_options=self.options.get("storage_options", None),
   1744 )
   1745 assert self.handles is not None
   1746 f = self.handles.handle

File ~\anaconda3\envs\env-2\lib\site-packages\pandas\io\common.py:713, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    710     codecs.lookup_error(errors)
    712 # open URLs
--> 713 ioargs = _get_filepath_or_buffer(
    714     path_or_buf,
    715     encoding=encoding,
    716     compression=compression,
    717     mode=mode,
    718     storage_options=storage_options,
    719 )
    721 handle = ioargs.filepath_or_buffer
    722 handles: list[BaseBuffer]

File ~\anaconda3\envs\env-2\lib\site-packages\pandas\io\common.py:409, in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    406     pass
    408 try:
--> 409     file_obj = fsspec.open(
    410         filepath_or_buffer, mode=fsspec_mode, **(storage_options or {})
    411     ).open()
    412 # GH 34626 Reads from Public Buckets without Credentials needs anon=True
    413 except tuple(err_types_to_retry_with_anon):

File ~\anaconda3\envs\env-2\lib\site-packages\fsspec\core.py:135, in OpenFile.open(self)
    128 def open(self):
    129     """Materialise this as a real open file without context
    130 
    131     The OpenFile object should be explicitly closed to avoid enclosed file
    132     instances persisting. You must, therefore, keep a reference to the OpenFile
    133     during the life of the file-like it generates.
    134     """
--> 135     return self.__enter__()

File ~\anaconda3\envs\env-2\lib\site-packages\fsspec\core.py:103, in OpenFile.__enter__(self)
    100 def __enter__(self):
    101     mode = self.mode.replace("t", "").replace("b", "") + "b"
--> 103     f = self.fs.open(self.path, mode=mode)
    105     self.fobjects = [f]
    107     if self.compression is not None:

File ~\anaconda3\envs\env-2\lib\site-packages\fsspec\spec.py:1106, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
   1104 else:
   1105     ac = kwargs.pop("autocommit", not self._intrans)
-> 1106     f = self._open(
   1107         path,
   1108         mode=mode,
   1109         block_size=block_size,
   1110         autocommit=ac,
   1111         cache_options=cache_options,
   1112         **kwargs,
   1113     )
   1114     if compression is not None:
   1115         from fsspec.compression import compr

File ~\anaconda3\envs\env-2\lib\site-packages\s3fs\core.py:640, in S3FileSystem._open(self, path, mode, block_size, acl, version_id, fill_cache, cache_type, autocommit, requester_pays, cache_options, **kwargs)
    637 if cache_type is None:
    638     cache_type = self.default_cache_type
--> 640 return S3File(
    641     self,
    642     path,
    643     mode,
    644     block_size=block_size,
    645     acl=acl,
    646     version_id=version_id,
    647     fill_cache=fill_cache,
    648     s3_additional_kwargs=kw,
    649     cache_type=cache_type,
    650     autocommit=autocommit,
    651     requester_pays=requester_pays,
    652     cache_options=cache_options,
    653 )

File ~\anaconda3\envs\env-2\lib\site-packages\s3fs\core.py:1989, in S3File.__init__(self, s3, path, mode, block_size, acl, version_id, fill_cache, s3_additional_kwargs, autocommit, cache_type, requester_pays, cache_options)
   1987         self.details = s3.info(path)
   1988         self.version_id = self.details.get("VersionId")
-> 1989 super().__init__(
   1990     s3,
   1991     path,
   1992     mode,
   1993     block_size,
   1994     autocommit=autocommit,
   1995     cache_type=cache_type,
   1996     cache_options=cache_options,
   1997 )
   1998 self.s3 = self.fs  # compatibility
   2000 # when not using autocommit we want to have transactional state to manage

File ~\anaconda3\envs\env-2\lib\site-packages\fsspec\spec.py:1462, in AbstractBufferedFile.__init__(self, fs, path, mode, block_size, autocommit, cache_type, cache_options, size, **kwargs)
   1460         self.size = size
   1461     else:
-> 1462         self.size = self.details["size"]
   1463     self.cache = caches[cache_type](
   1464         self.blocksize, self._fetch_range, self.size, **cache_options
   1465     )
   1466 else:

File ~\anaconda3\envs\env-2\lib\site-packages\fsspec\spec.py:1475, in AbstractBufferedFile.details(self)
   1472 @property
   1473 def details(self):
   1474     if self._details is None:
-> 1475         self._details = self.fs.info(self.path)
   1476     return self._details

File ~\anaconda3\envs\env-2\lib\site-packages\fsspec\asyn.py:113, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    110 @functools.wraps(func)
    111 def wrapper(*args, **kwargs):
    112     self = obj or args[0]
--> 113     return sync(self.loop, func, *args, **kwargs)

File ~\anaconda3\envs\env-2\lib\site-packages\fsspec\asyn.py:98, in sync(loop, func, timeout, *args, **kwargs)
     96     raise FSTimeoutError from return_result
     97 elif isinstance(return_result, BaseException):
---> 98     raise return_result
     99 else:
    100     return return_result

File ~\anaconda3\envs\env-2\lib\site-packages\fsspec\asyn.py:53, in _runner(event, coro, result, timeout)
     51     coro = asyncio.wait_for(coro, timeout=timeout)
     52 try:
---> 53     result[0] = await coro
     54 except Exception as ex:
     55     result[0] = ex

File ~\anaconda3\envs\env-2\lib\site-packages\s3fs\core.py:1257, in S3FileSystem._info(self, path, bucket, key, refresh, version_id)
   1245     if (
   1246         out.get("KeyCount", 0) > 0
   1247         or out.get("Contents", [])
   1248         or out.get("CommonPrefixes", [])
   1249     ):
   1250         return {
   1251             "name": "/".join([bucket, key]),
   1252             "type": "directory",
   1253             "size": 0,
   1254             "StorageClass": "DIRECTORY",
   1255         }
-> 1257     raise FileNotFoundError(path)
   1258 except ClientError as e:
   1259     raise translate_boto_error(e, set_cause=False)

FileNotFoundError: my-bucket-1/test.csv

s3fs-2022.11.0, aiobotocore-2.4.0, botocore-1.27.59

fs = s3fs.S3FileSystem(anon=True)
fs.ls('s3://dask-data/nyc-taxi/2015')
ParseError
mdurant
  • 27,272
  • 5
  • 45
  • 74

2 Answers2

2

Check the bucket policy / IAM role that gives you permissions to access the bucket. It should have /* after the name of the resource:

 "Action": "s3:GetObject",
 "Resource": "arn:aws:s3:::my-bucket-1/*"

to allow you access the objects in the bucket, not just the bucket itself.

Alex Chadyuk
  • 1,421
  • 2
  • 10
  • I've got the same code snips to work in both a google colab and EC2 instance. I can see all my files. So that tells me my problem is not on the AWS side, but instead on my computer. I'd like to check what you suggested but I don't think I have a roll associated with my s3. On my IAM page i can't find any rolls to do with s3 and I don't recall setting one up. All of my bucket policies are blank at the moment as well. – Dylan Bodkin Feb 17 '23 at 07:49
  • I have reproduced the behaviour you expected, ( i.e.>>> fs.ls('s3://my-bucket-1') ['my-bucket-1/test.txt', 'my-bucket-1/test.csv']) with s3fs version 0.4 on my machine. Perhaps there is some conflict in your conda env. – Alex Chadyuk Feb 17 '23 at 14:44
  • I downgraded to 0.4 and got the expected behavior. – Dylan Bodkin Feb 18 '23 at 03:38
  • 0.4 is over three years old, we would prefer to know that was different between the two configurations. – mdurant Feb 19 '23 at 20:50
-2

Have you tried boto3 as a possible alternative to s3fs?

Moritz Ringler
  • 9,772
  • 9
  • 21
  • 34