6

I'm trying to read a shapefile

r = shapefile.Reader(filepath, encoding = "utf-8")

but when I try to get a value from the .records() object like:

 r.records()[0]

it returns to me the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 4: invalid continuation byte
Paulo Calado
  • 195
  • 1
  • 3
  • 7

2 Answers2

8

That means your file is not encoded in utf-8. Try: ISO8859-1

If you are on Linux (or have git-bash on Windows) you can use the file command to find out the encoding.

JoelFan
  • 37,465
  • 35
  • 132
  • 205
  • 1
    Switching to an 8-bit encoding might merely remove the symptom, and just produce junk output. You should review the problematic inputs and establish which precise encoding they use. If the `\xE9` byte represents an `é´ character, Latin-1 or CP1252 are good guesses; if not, maybe look at https://tripleee.github.io/8bit/#e9 for other interpretations. – tripleee Jan 21 '20 at 17:59
  • @tripleee or use `file` as I suggested to detect the encoding – JoelFan Jan 21 '20 at 18:01
  • 1
    It's not exactly high precision. If you have just a few bytes and know or can guess what characters they represent, that's probably going to be more accurate. `file` will guess wildly in this scenario. – tripleee Jan 21 '20 at 18:31
  • 1
    hahaha @JoelFan a shapefile describes a geographic polygon, it's broadly used with census data :p – Paulo Calado Jan 21 '20 at 18:46
0

You can use this piece of code, to try different encodings when opening the shapefile. The code also searches for a .cpg file, which holds the encoding for a shapefile.

import os
import shapefile

# List with different encodings
encodings = ['utf-8', 'ISO8859-1']

# Try to add the encoding from the .cpg file
cpg_path = shp_path.replace('.shp', '.cpg')
if os.path.exists(cpg_path):
    with open(cpg_path) as cpg_file:
        for l in cpg_file:
            encodings.insert(0, str(l))


# Try to open the shapefile with the encodings from the list
for e in encodings:
    try:
        with shapefile.Reader(shp_path, encoding=e) as shp:
            print(f'Successfully opened the shapefile with encoding: {e}')
    except UnicodeDecodeError:
        print(f'Error when opening the shapefile with encoding: {e}')
Helge Schneider
  • 483
  • 5
  • 8