Python: UnicodeEncodeError with codecs.open

Question

I'm trying to work with orgnode.py (from here) to parse org files. These files are English/Persian and using file -i it seems they are utf-8 encoded. But I recieve this error when use makelist function (which itself uses codec.open with utf-8):

>>> Orgnode.makelist("toread.org")
[**  [[http://www.apa.org/helpcenter/sexual-orientation.aspx][Sexual orientation, homosexuality and bisexuality]]            :ToRead:



Added:[2013-11-06 Wed]
, **  [[http://stackoverflow.com/questions/11384516/how-to-make-all-org-files-under-a-folder-added-in-agenda-list-automatically][emacs - How to make all org-files under a folder added in agenda-list automatically? - Stack Overflow]] 

(setq org-agenda-text-search-extra-files '(agenda-archives "~/org/subdir/textfile1.txt" "~/org/subdir/textfile1.txt"))
Added:[2013-07-23 Tue] 
, Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 63-66: ordinal not in range(128)

The function returns a list of org headings, but instead of last item (which is written in Persian) it shows the error. Any suggestion how can I deal with this error?

The error is almost certainly *not* thrown by `codecs.open()` as that would be a **decode** exception. You have an **encoding** exception instead. Are you printing the unicode values perhaps? Show us your code and the full traceback. — Martijn Pieters, Mar 24 '14 at 18:37
I added the traceback but didn't understand what do you mean by `printing the unicode values` (sorry for my noobness). Also are there any other info I should add to my question? — sineau, Mar 24 '14 at 20:42
That doesn't look like a complete traceback; can you show us the code for `Orgnode.makelist()` at all? **Something else** causes Python to encode already-read Unicode objects back to ASCII. The usual suspects are mixing Unicode and byte string objects, printing, or writing to a regular file. — Martijn Pieters, Mar 24 '14 at 20:45
Here's the link: https://github.com/albins/orgnode/blob/master/Orgnode.py#L200. This is all the traceback I got. Interestingly it is an item in the list returned by makelist function. — sineau, Mar 25 '14 at 05:33

score 0 · Accepted Answer · answered Mar 25 '14 at 13:15

As the traceback tells you, the exception is raised by the statement you input on the Python console itself (Orgnode.makelist("toread.org")), and not in one of the functions called during the evaluation of the statement.

This is typical of encoding errors when the interpreter automatically converts the return value of the statement to display it back on the console. The text displayed is the result of applying the repr() builtin to the return value.

Here the repr() of the result of makelist is a unicode object, which the interpreter tries to convert to str using the "ascii" codec by default.

The culprit is the Orgnode.__repr__ method (https://github.com/albins/orgnode/blob/master/Orgnode.py#L592) which return a unicode object (because node content has automatically been decoded with codecs.open), although __repr__ methods are usually expected to return strings with only safe (ASCII) characters.

Here is the smallest change you can do to Orgnode as a workaround for your problem:

-- a/Orgnode.py
+++ b/Orgnode.py
@@ -612,4 +612,4 @@ class Orgnode(object):
 # following will output the text used to construct the object
         n = n + "\n" + self.body

-        return n
+        return n.encode('utf-8')

If you want a version which only returns ASCII characters, you can use 'string-escape' as the codec instead of 'utf-8'.

This is only a quick and dirty fix. The right solution would be to rewrite a proper __repr__ method, and also add the __str__ and __unicode__ methods that this class lacks. (I might even fix this myself if I find the time, as I am quite interested in using Python code to manipulate my Org-mode files)

Thanks this did the trick. I would be very pleased to see a better version of Orgnode.py. Actually there are some other python parsers for org files, but this one was the easiest for me to understand. — sineau, Mar 25 '14 at 18:19

Python: UnicodeEncodeError with codecs.open

1 Answers1