1

I have a bunch of Python 2.7 code which I am trying to make single-source compatible with Python 3, to aid migration over time away from 2.7. The most common issue I'm seeing is around simple writes of non-unicode memory content to disk. For example:

        with io.open(some_path, 'w', encoding='utf-8') as the_file:
            the_file.write(unicode(json.dumps(some_object, indent=2)))

        with io.open(some_path, 'w', encoding='utf-8') as the_file:
            the_file.write(unicode(yaml.dump(some_object, default_flow_style=False))) # From PyYAML

        with io.open(some_path, 'w', encoding='utf-8') as the_file:
            the_file.write(unicode(some_multiline_string)) # A simple string passed in, not explicitly marked up as unicode where it was declared

And of course the casts to unicode fail under Python 3 because that type doesn't exist. If I change the casts so it's something like:

            the_file.write(str(json.dumps(some_object, indent=2)))

then it works in Python 3, but fails under Python 2 because str and unicode are distinct, and file.write needs a unicode parameter. While json.dumps calls can be adapted to json.dump calls that use the file directly, as far as I can tell the yaml dump calls can't.

Ideally there would be a way to coerce the types of all of the things being written to the type that file.write wants (a unicode string), but I can't find what that is. I had hoped that you would always be able to decode the various forms of non-unicode string into a unicode string, but str objects in Python 2 don't appear to have a decode function.

All of the other questions (here on Stack Overflow and elsewhere) and documentation I've found give conflicting advice, focus on buffer objects, or simply give advice on how to do it in one version of Python or the other. I need a solution which works equally in both Python 2.7 and 3.x, and I'm hoping that there is a graceful Python-esque solution that doesn't involve branching on a test that detects which version is in use.

MrCranky
  • 1,498
  • 24
  • 32
  • 1
    Have you considered using [`six`](https://pypi.org/project/six/)? – jonrsharpe Jun 29 '20 at 09:41
  • It would be an option, certainly, as I believe we're using other dependencies that already require it. – MrCranky Jun 29 '20 at 09:42
  • FWIW, the closest question (https://stackoverflow.com/questions/49702626/how-to-write-unicode-text-to-file-in-python-2-3-using-same-code) seems to ask the same question, but the answers only cover the case of in-code unicode literals. – MrCranky Jun 29 '20 at 09:44
  • 1
    Python 2 `str` does have a `.decode` method. It's the Python 3 `str` type that doesn't. – lenz Jun 29 '20 at 12:06
  • And like suggested already, `six` has exactly the functions and wrappers you need. You can of course try to reinvent the wheel yourself, but why not rely on a widely used and well-tested library? – lenz Jun 29 '20 at 12:20
  • As I said, happy to use it, hoping for an answer that makes it clear how. – MrCranky Jun 30 '20 at 10:38

1 Answers1

1

So based on advice in the comments I went with the six module. Version 1.12.0 and higher includes six.ensure_text which is the "way to coerce the types of all of the things being written to [unicode]" I described in the question.

        with io.open(some_path, 'w', encoding='utf-8') as the_file:
            the_file.write(six.ensure_text(json.dumps(some_object, indent=2)))

        with io.open(some_path, 'w', encoding='utf-8') as the_file:
            the_file.write(six.ensure_text(yaml.dump(some_object, default_flow_style=False))) # From PyYAML

        with io.open(some_path, 'w', encoding='utf-8') as the_file:
            the_file.write(six.ensure_text(some_multiline_string)) # A simple string passed in, not explicitly marked up as unicode where it was declared

I encountered some version compatibility issues (the other pip modules I'm relying on seem to want six 1.11.0), but I've worked around those, and the functionality provided can be used cleanly in all our existing code.

MrCranky
  • 1,498
  • 24
  • 32