3

I am experiencing the weirdest bug of my life.

I am fixing my Hacker News API and this small piece of code is giving me headaches:

from hn import HN

hn = HN()


# print top stories from homepage
for story in hn.get_stories():
    print story.title
    print story

Story class a __str__ method as follows:

def __str__(self):
    """
    Return string representation of a story
    """
    return self.title

(This is a little different from the code in the repo. I had to debug a lot here.)

Anyways, the output is this:

Turn O(n^2) reverse into O(n)
Turn O(n^2) reverse into O(n)
My run-in with unauthorised Litecoin mining on AWS
My run-in with unauthorised Litecoin mining on AWS
Amazon takes away access to purchased Christmas movie during Christmas
Traceback (most recent call last):
  File "my_test_bot.py", line 11, in <module>
    print story
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 60: ordinal not in range(128)

I have no idea why this is failing. Both the __str__ and the print story statements print a unicode. Then why is the latter not working?

Also, doing print unicode(story) works just fine (why??), but unfortunately I cannot use unicode() since it's not py3 compatible.

title is encoded as: title.encode('cp850', errors='replace').decode('cp850')

What the hell is happening here? How do I make sure that my API would work for any (meaning most) of the strings it can find and is both py2 and py3 compatible?

I have downloaded the page that is causing this error right now for offline debugging.

KGo
  • 18,536
  • 11
  • 31
  • 47
  • You should use the `str.encode` and `str.decode` methods. Either you work only with bytes array (py2 style) which I don't recommend, either you work only with unicode strings (using either `str.encode` or `print(u"Hello")`. By the way if you want to be compliant with both py2 and py3, use print as a function and not a keyword. Currently your problem is that you're trying to decode unicode chars with ascii codec. Which will never work. Ever. :P – Depado Dec 16 '13 at 08:54

2 Answers2

2

__str__ returns a byte array, without any info about encoding, your console app is likely trying to encode whatever returned by __str__ to ascii and failing at that. You can try and use __unicode__ which returns characters. There's more info in this answer.

And yes, py3 only has __str__ meta stuff, so you'll have to keep __unicode__ for compatibility

Community
  • 1
  • 1
Dmitry Shevchenko
  • 31,814
  • 10
  • 56
  • 62
  • 1
    This doesn't explain why `print story.title` apparently works but `print story` doesn't, since according to his code they should have the same result. – BrenBarn Dec 16 '13 at 05:24
  • 1
    @BrenBam, not necessarily: `print story.title` implicitly computes `str(story.title)` and prints the result of that. But `print story` implicitly computes `str(story)` which is `story.__str__()` which returns `story.title`. The builtin `str()` is *not* applied to `story.title` in the latter case, only in the former case. – Tim Peters Dec 16 '13 at 05:34
  • Not really. This didn't really work. `return '[{0}]: "{1}" by {2}'.format(self.points, self.title, self.submitter)` in `__unicode__` still causes the error. – KGo Dec 16 '13 at 05:36
  • @KaranGoel, for your `__unicode__` method, you can use a `u''` literal if you only support Python 3 versions 3.3+. Otherwise you can decode the template before calling the `format` method. For a unified 2 & 3 codebase, consider using [Six](http://pythonhosted.org/six/#binary-and-text-data). – Eryk Sun Dec 16 '13 at 08:45
  • Adding a `u` in `__unicode__` did the trick in py2. Can you recommend some resources where I can learn more about py2 and py3 encodings and fix this bug? – KGo Dec 16 '13 at 17:17
0

This nasty kind of problems can often be explained when you try to save the output to the file instead of printing. Try:

for story in hn.get_stories():
    print type(story.title)
    print type(story)

    with open('content.txt', 'ab') as f:
        f.write(story.title)
        f.write('\n\n')
        f.write(story)
        f.write('\n-----------------------------------------------\n')

I expect this to be iterative approach to the solution. More facts is needed. You may be misleaded by something.

pepr
  • 20,112
  • 15
  • 76
  • 139