8

I am trying to read in a shapefile into a GeoDataFrame.

Normally I just do this and it works:

import pandas as pd

import geopandas as gpd
from shapely.geometry import Point

df = gpd.read_file("wild_fires/nbac_2016_r2_20170707_1114.shp")

But this time it gives me the error: b'Recode from ANSI 1252 to UTF-8 failed with the error: "Invalid argument".'

Full error:

---------------------------------------------------------------------------
CPLE_AppDefinedError                      Traceback (most recent call last)
<ipython-input-14-adcad0275d30> in <module>()
----> 1 df_wildfires_2016 = gpd.read_file("wild_fires/nbac_2016_r2_20170707_1114.shp")

/usr/local/lib/python3.6/site-packages/geopandas/io/file.py in read_file(filename, **kwargs)
     19     """
     20     bbox = kwargs.pop('bbox', None)
---> 21     with fiona.open(filename, **kwargs) as f:
     22         crs = f.crs
     23         if bbox is not None:

/usr/local/lib/python3.6/site-packages/fiona/__init__.py in open(path, mode, driver, schema, crs, encoding, layer, vfs, enabled_drivers, crs_wkt)
    163         c = Collection(path, mode, driver=driver, encoding=encoding,
    164                        layer=layer, vsi=vsi, archive=archive,
--> 165                        enabled_drivers=enabled_drivers)
    166     elif mode == 'w':
    167         if schema:

/usr/local/lib/python3.6/site-packages/fiona/collection.py in __init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, **kwargs)
    151             if self.mode == 'r':
    152                 self.session = Session()
--> 153                 self.session.start(self)
    154             elif self.mode in ('a', 'w'):
    155                 self.session = WritingSession()

fiona/ogrext.pyx in fiona.ogrext.Session.start (fiona/ogrext2.c:8432)()

fiona/_err.pyx in fiona._err.GDALErrCtxManager.__exit__ (fiona/_err.c:1861)()

CPLE_AppDefinedError: b'Recode from ANSI 1252 to UTF-8 failed with the error: "Invalid argument".'

I've been trying to figure out why I am getting the error for a while but can't seem to find the answer.

The data was obtained from this webpage I downloaded only the 2016 link: http://cwfis.cfs.nrcan.gc.ca/datamart/download/nbac?token=78e9bd6af67f71204e18cb6fa4e47515

Would anybody be able to help me? Thank you.

Julien
  • 992
  • 1
  • 10
  • 26
  • Can you give the full error trace? – Mark Ransom Nov 13 '17 at 23:47
  • @MarkRansom just added the full error – Julien Nov 13 '17 at 23:55
  • @Julien so you have done that same process with other data and it works ok? Seems to suggest that the problem is with this dataset (and based on the error, probably had some unrecognized character that failed to convert to UTF-8) – DarkCygnus Nov 14 '17 at 00:33
  • @DarkCygnus Yes, normally it just works. Is there a way to ignore or bypass this error? – Julien Nov 14 '17 at 01:39
  • @Julien added an answer, with 2 options you got, that I tested and were able to open without errors :) – DarkCygnus Nov 14 '17 at 01:42
  • @DarkCygnus it should be possible to do this without converting the file externally. If I had any familiarity with these packages I'd be investigating it myself, starting with adding `encoding='utf-8'` to the `read_file` call. – Mark Ransom Nov 14 '17 at 19:02
  • @MarkRansom I did considered those approaches. I unsuccessfully attempted to play with the Fiona Open with the `encoding` parameter, along with other tests, but still was unable to get any results. Seems that *this* specific dataset has those encoding problems, and those were the 2 solutions I could manage to get, and both work. I suspect there should be yet another more direct way to bypass this, thanks for your comment, I'll try to give this more love when I can. Cheers. – DarkCygnus Nov 14 '17 at 19:05

4 Answers4

7

Seems that your shapefile contains non-UTF characters that causes the Fiona.open() call to fail (geopandas uses Fiona to open files).

What I did that solved this error was to open the Shapefile (with QGis for example), then selecting save as, and specifying the Encoding option as "UTF-8":

enter image description here

After doing this, I got no error when calling df = gpd.read_file("convertedShape.shp").


Another way to do this without having to use QGis or similar, is to read and save your Shapefile again (effectively converting to the desired format). With OGR you can do something like this:

from osgeo import ogr

driver = ogr.GetDriverByName("ESRI Shapefile")
ds = driver.Open("nbac_2016_r2_20170707_1114.shp", 0) #open your shapefile
#get its layer
layer = ds.GetLayer()

#create new shapefile to convert
ds2 = driver.CreateDataSource('convertedShape.shp')
#create a Polygon layer, as the one your Shapefile has
layer2 = ds2.CreateLayer('', None, ogr.wkbPolygon)
#iterate over all features of your original shapefile
for feature in layer:
   #and create a new feature on your converted shapefile with those features
   layer2.CreateFeature(feature)

ds = layer = ds2 = layer2 = None

This also enabled to successfully open with df = gpd.read_file("convertedShape.shp") after conversion. Hope this helps.

DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
  • thank you! I don't have QGis. I tried pip installing osgeo but it doesn't seem to be working. Any idea how I can download tha library? – Julien Nov 14 '17 at 02:04
  • @Julien several options are provided [here](https://gis.stackexchange.com/questions/9553/installing-gdal-and-ogr-for-python). I *think* I installed it with `apt-get install python-gdal`, or with `pip install GDAL`... most likely the first one, but that Q I linked has several alternatives (easiest if you have conda). Hope my answer was useful :) – DarkCygnus Nov 14 '17 at 02:50
  • @Julien as a side comment, I suggest you give QGis a try (it is Open), as it usually always comes handy to inspect that shapefiles and rasters are the way we expect (encoding, attributes, etc.) before processing or reading them. – DarkCygnus Nov 14 '17 at 03:01
  • Thanks for the advice! I think I will install QGis. – Julien Nov 14 '17 at 03:23
4
with fiona.open(file, encoding="UTF-8") as f:

worked for me.

Vlad
  • 387
  • 2
  • 11
2

Since you have GDAL installed, I recommend converting the file to UTF-8 using the CLI:

ogr2ogr output.shp input.shp -lco ENCODING=UTF-8

Worked like a charm for me. It's much faster than QGIS or Python and can be applied in a cluster environment.

Adam Erickson
  • 6,027
  • 2
  • 46
  • 33
1

As an extension to this answer, you can pass fiona arguments through geopandas read_file:

df = gpd.read_file("filename", encoding="utf-8")
w-m
  • 10,772
  • 1
  • 42
  • 49