5

I am trying to read the below json data using attached Python code (Python V3.5.1) but the issue is that Character representation ç as ç and £ as £. Please help me with the code which will correctly read and write data to and from the file, without changing the format or char set

Json Data:

{
    "config":[{
            "filetype": ".csv",
            "coldelimiter":"ç",
            "rowdelimiter":"£"
    }]
}

Python code:

import json
import os

fileLoc=os.path.join(os.getcwd(),"appconfig.json")
json_data=open(fileLoc).read()
print(json_data)

Output:

{
    "config":[{
            "filetype": ".csv",
            "coldelimiter":"ç",
            "rowdelimiter":"£"
    }]
}
fredtantini
  • 15,966
  • 8
  • 49
  • 55
RintG
  • 61
  • 1
  • 4

1 Answers1

3

Try to avoid implicit encoding and decoding.

When you use open() to read (or write) text files (such as JSON, but unlike XML), then the file contents are decoded with some default encoding. Which default encoding is used depends on your environment; you can see this with locale.getpreferredencoding().

So let's assume that appconfig.json is stored on disk with UTF-8, but your locale is configured to use Latin-1, then the letter ç will be misinterpreted as the sequence ç. Confirm:

>>> 'ç'.encode('utf8').decode('latin1')
'ç'

If this is the case, then it's easy to fix: specify the encoding on open():

with open(fileLoc, 'r', encoding='utf8') as f:
    json_data = f.read()

There's another possible (but less likely) explanation: Maybe the default encoding is already UTF-8, thus the data is decoded correctly when read from the file. The print() expression then encodes the data, again using UTF-8, thus sending a sequence of bytes to STDOUT which is exactly the same as the file content. But then, your terminal (or whatever you use to execute the script) misinterprets the output as Latin-1, such that they are displayed as garbled characters.

If the latter is the case, then you need to fix the terminal configuration (to accept UTF-8), or re-encode sys.stdout (with sys.stdout = codecs.getwriter('latin-1')(sys.stdout), but I don't recommend that).

lenz
  • 5,658
  • 5
  • 24
  • 44
  • Thanks @lenz, yes thats exactly I did and now its working. On different note , I am new to Unix env. I think source code build and tested in Win env are still work in Unix env provided same Python version is used . For file path or location I am using os package like join(sourceLoc, file), hope my understanding is fine. For windows path could be c:\filelocation and in Unix could be /var/sp/filelocation/ any document \ material around Python on unix will be helpful. – RintG Apr 17 '17 at 06:49
  • @RintG I'm not sure I understand. There are a few things that need care if you are trying to write code that is portable across OSes, and using `os.path.join` to write paths is certainly a good choice. If you have a more concrete question about Python on Unix, post a separate question. – lenz Apr 17 '17 at 18:51