2

I am trying to send some json object from my node.js server to a python script. However when trying to convert the json object to dictionary using json.loads, for many inputs there are UnicodeEncodeErrors. WHat do I need to do in order to correctly decode the js object.

Error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2062: character maps to <undefined>
    at PythonShell.parseError (D:\Users\Temp\Desktop\empman\node_modules\python-shell\index.js:183:17)
    at terminateIfNeeded (D:\Users\Temp\Desktop\empman\node_modules\python-shell\index.js:98:28)
    at ChildProcess.<anonymous> (D:\Users\Temp\Desktop\empman\node_modules\python-shell\index.js:88:9)
    at emitTwo (events.js:106:13)
    at ChildProcess.emit (events.js:191:7)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:219:12)
    at Process.onexit (D:\Users\Temp\Desktop\empman\node_modules\async-listener\glue.js:188:31)
    ----- Python Traceback -----
    File "word.py", line 38, in <module>
      json_data=open('data.txt').read()
    File "D:\Users\Temp\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 23, in decode
      return codecs.charmap_decode(input,self.errors,decoding_table)[0]

corresponding python code

from docx import Document
from docx.shared import Inches
import sys
import io
import json
document = Document('template.docx')
# newdocument = Document('resume.docx')
# print(sys.argv)  # Note the first argument is always the script filename.
resumearray = [];
for x in range(0, 21):
    resumearray.append(input())
#json_data=open('data.txt').read()
f = io.open('data','r', encoding='utf-16-le')
# #datastore = json.loads(f.read)
print(f.read())
# text = f.read()
# json_data = text

# document.add_paragraph('_______________________________________________________________________')
#document.add_paragraph(resumearray[1])
k=resumearray[1]
#document.add_paragraph(k)
jsobject = json.loads(k)
document.add_paragraph('_______________________________________________')
#document.add_paragraph(jsobject.values())
for x in range(0, 9):
    if resumearray[x]=='[]':
        document.add_paragraph('nothing was found')
    else:
        document.add_paragraph(resumearray[x])
Abhishek Anand
  • 447
  • 5
  • 22

1 Answers1

2

You are running python on Windows, where the default encoding is cp1252. The json is encoded as utf-8, hence the error.

>>> with open('blob.json', encoding='cp1252') as f:
...     j = json.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python3.6/json/__init__.py", line 296, in load
    return loads(fp.read(),
  File "/usr/local/lib/python3.6/encodings/cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2795: character maps to <undefined>

Use utf-8 instead:

>>> with open('blob.json', encoding='utf-8') as f:
...     j = json.load(f)
... 
>>> print(len(j))
29
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • This works. Thank you. Although I was hoping to not having to write to a file and directly pass input to python – Abhishek Anand Dec 18 '17 at 05:42
  • @AbhishekAnand maybe have a look at the answers to this question https://stackoverflow.com/q/23450534/5320906 – snakecharmerb Dec 18 '17 at 09:47
  • I have seen it. In my original code, I was doing that. However the decoding error came. I guess python can only take raw inputs in arguments. I have switched to the method you suggested, So its all good :) – Abhishek Anand Dec 18 '17 at 09:51