Write Shapefile to AWS S3 with geopandas in Glue Python Shell

Question

I have read shapefile in a zip format from my S3 bucket successfully through geopandas, but I get error when trying to output the same geodataframe as a shapefile to the same S3 bucket.

The code below is how I read the zip file, and it works nicely:

## session for connecting to S3
session = boto3.session.Session(aws_access_key_id='MY-KEY-ID',
                                aws_secret_access_key='MY-KEY')
s3 = session.resource('s3') 
bucket = s3.Bucket('my_bucket')

## read shapefile
TPG = bucket.Object(key='/shapefiles/grid.zip') 
TPGrid = geopandas.read_file(TPG.get()['Body'])

But when I tried to output the same geodataframe like this:

TPGrid.to_file(filename='s3://my_bucket/output/TPGrid.zip', driver='ESRI Shapefile')

I will get error code:

ERROR:fiona._env:Only read-only mode is supported for /vsicurl
ERROR:fiona._env:Only read-only mode is supported for /vsicurl
ERROR:fiona._env:Only read-only mode is supported for /vsicurl
ERROR:fiona._env:Unable to open /vsis3/my_bucket/output/TPGrid.zip/TPGrid.shp or /vsis3/my_bucket/output/TPGrid.zip/TPGrid.SHP.
Traceback (most recent call last):
  File "fiona/ogrext.pyx", line 1133, in fiona.ogrext.WritingSession.start
  File "fiona/_err.pyx", line 291, in fiona._err.exc_wrap_pointer
fiona._err.CPLE_AppDefinedError: Unable to open /vsis3/my_bucket/output/TPGrid.zip/TPGrid.shp or /vsis3/my_bucket/output/TPGrid.zip/TPGrid.SHP.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 211, in <module>
    runpy.run_path(temp_file_path, run_name='__main__')
  File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/glue-python-scripts-c8krhm5u/test_to_file_geo.py", line 40, in <module>
  File "/glue/lib/installation/geopandas/geodataframe.py", line 1086, in to_file
    _to_file(self, filename, driver, schema, index, **kwargs)
  File "/glue/lib/installation/geopandas/io/file.py", line 328, in _to_file
    filename, mode=mode, driver=driver, crs_wkt=crs_wkt, schema=schema, **kwargs
  File "/glue/lib/installation/fiona/env.py", line 408, in wrapper
    return f(*args, **kwargs)
  File "/glue/lib/installation/fiona/__init__.py", line 274, in open
    **kwargs)
  File "/glue/lib/installation/fiona/collection.py", line 165, in __init__
    self.session.start(self, **kwargs)
  File "fiona/ogrext.pyx", line 1141, in fiona.ogrext.WritingSession.start
fiona.errors.DriverIOError: Unable to open /vsis3/my_bucket/output/TPGrid.zip/TPGrid.shp or /vsis3/my_bucket/output/TPGrid.zip/TPGrid.SHP.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 230, in <module>
    raise e_type(e_value).with_traceback(new_stack)
  File "/tmp/glue-python-scripts-c8krhm5u/test_to_file_geo.py", line 40, in <module>
  File "/glue/lib/installation/geopandas/geodataframe.py", line 1086, in to_file
    _to_file(self, filename, driver, schema, index, **kwargs)
  File "/glue/lib/installation/geopandas/io/file.py", line 328, in _to_file
    filename, mode=mode, driver=driver, crs_wkt=crs_wkt, schema=schema, **kwargs
  File "/glue/lib/installation/fiona/env.py", line 408, in wrapper
    return f(*args, **kwargs)
  File "/glue/lib/installation/fiona/__init__.py", line 274, in open
    **kwargs)
  File "/glue/lib/installation/fiona/collection.py", line 165, in __init__
    self.session.start(self, **kwargs)
  File "fiona/ogrext.pyx", line 1141, in fiona.ogrext.WritingSession.start
fiona.errors.DriverIOError: Unable to open /vsis3/my_bucket/output/TPGrid.zip/TPGrid.shp or /vsis3/my_bucket/output/TPGrid.zip/TPGrid.SHP.

I have tried several ways, such as using '.csv' or '.shp', but not any one worked. I am using python 3.6 and packages below, hope these information will help:

geopandas-0.9.0
shapely-1.7.1
fiona-1.8.20
GDAL-3.2.3

I kept fighting with this problem all day.... Any help will be highly appreciated.

Have you tried converting your pandas dataframe to a pyspark dataframe and then to an AWS dynamic frame, and using the built in AWS write capabilities? Alternatively, if you perform a to_string() on the pandas dataframe, and write that as content to S3 via boto3 you may get what you are seeking. — jonlegend, Jun 21 '21 at 16:33

Write Shapefile to AWS S3 with geopandas in Glue Python Shell

0 Answers0