0

please I need help with this:

url ='https://www.sec.gov/Archives/edgar/data/1437750/0001477932-13-004416.txt'
with open('file', 'wb') as f:
    f.write(requests.get('%s' % url).content)
with open('file',  'r') as t:
            words=  t.read()

The above gives me the following error:

return codecs.charmap_decode(input,self.errors,decoding_table)[0]  
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1010494: character maps to < undefined>

Thank you!

zwer
  • 24,943
  • 3
  • 48
  • 66
user5282933
  • 1
  • 1
  • 3

2 Answers2

1

I just experienced the same problem. When I was trying to read the file, one of my strings had a double space: ' '. Removing that double space fixed the 0x9d problem.

Russell
  • 19
  • 5
0

Why are you writing your file as a binary, and then reading it as a unicode string? Python doesn't know how to decode some bytes from the original stream until you tell it what codec to use. Since the file you've streamed in your first command is not utf-8 encoded, try decoding your file to latin-1 when reading it:

with open('file',  'r', encoding='latin-1') as t:
    words =  t.read()
zwer
  • 24,943
  • 3
  • 48
  • 66
  • In what ASCII-like character set is 0x9d meaningful? It's not valid Windows-1252. The Python "latin-1" codec translates it to Unicode 0x9D, which is "Operating System Command".[1] That makes little sense. Converting such text with the "latin-1" codec won't crash the Python program, but what you get in Unicode is a box with [009d]. Converting with "latin-1" just papers over the problem. It seems to be some kind of quote mark when it appears in English text. But it's not one of the special quote marks from Windows-1252. [1] http://www.fileformat.info/info/unicode/char/009d/index.htm – John Nagle Aug 18 '17 at 00:58
  • 1
    Not only is it *not* UTF-8 encoded, there appears to be binary data embedded in the page. You should not be trying to read binary data as text! Using `latin-1` encoding is a hack that should be avoided unless you are using it specifically to clean someone else's mess and you really know what you're doing. – Mark Ransom Aug 18 '17 at 22:56