1

I have a litany of unit tests that are run on Travis CI and only on PY3.2 it goes belly up. How can I solve this without using six.u()?

def test_parse_utf8(self):
    s = String("foo", 12, encoding="utf8")
    self.assertEqual(s.parse(b"hello joh\xd4\x83n"), u"hello joh\u0503n")

======================================================================
ERROR: Failure: SyntaxError (invalid syntax (test_strings.py, line 37))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.2.5/lib/python3.2/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/travis/virtualenv/python3.2.5/lib/python3.2/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/travis/virtualenv/python3.2.5/lib/python3.2/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/travis/virtualenv/python3.2.5/lib/python3.2/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/travis/build/construct/construct/tests/test_strings.py", line 37
    self.assertEqual(s.build(u"hello joh\u0503n"), b"hello joh\xd4\x83n")
                                               ^
SyntaxError: invalid syntax

Trying to get this to work:

PY3 = sys.version_info[0] == 3
def u(s): return s if PY3 else s.decode("utf-8")

self.assertEqual(s.parse(b"hello joh\xd4\x83n"), u("hello joh\u0503n"))

Quote from https://pythonhosted.org/six/

On Python 2, u() doesn’t know what the encoding of the literal is. Each byte is converted directly to the unicode codepoint of the same value. Because of this, it’s only safe to use u() with strings of ASCII data.

But the whole point of using unicode is to not be restricted to ASCII.

ArekBulski
  • 4,520
  • 4
  • 39
  • 61
  • 2
    Yeah, 3.2 just doesn't have that syntax. Are you required to support Python 2 and Python 3.2 with the same codebase, without using `2to3`? – user2357112 Aug 25 '16 at 00:05
  • @ArekBulski: 2to3 should never tell you to use `six`. I don't think any code anywhere in 2to3 knows about `six`. When I run 2to3 on code with `u` literals, it just strips the `u`. – user2357112 Aug 25 '16 at 01:48

3 Answers3

1

Could you instead do from __future__ import unicode_literals and not use the u syntax anywhere?

from __future__ import unicode_literals makes string literals without a preceding u in earlier versions of Python act as in Python 3, that is default to unicode. So if you do from __future__ import unicode_literals and change all u"strings" to "strings", your string literals will be unicode in all versions. This will not affect b literals.

spruceb
  • 621
  • 5
  • 12
1

I think you're out of luck here.

Either use six.u() or drop support for Python 3.2.

Jace Browning
  • 11,699
  • 10
  • 66
  • 90
  • The source is here: https://github.com/spotify/luigi/blob/b5b578da87f8ed18bb2b8c077f2d24cc5d912c7c/luigi/six.py#L646-L647 Essentially, it let's Python 3 strings pass through and converts to unicode on Python 2. – Jace Browning Aug 25 '16 at 01:41
0

I taken the implementation of six.u() and discarded six.

import sys
PY3 = sys.version_info[0] == 3
def u(s): return s if PY3 else unicode(s.replace(r'\\', r'\\\\'), "unicode_escape")
ArekBulski
  • 4,520
  • 4
  • 39
  • 61