I have a bunch of Python 2.7 code which I am trying to make single-source compatible with Python 3, to aid migration over time away from 2.7. The most common issue I'm seeing is around simple writes of non-unicode
memory content to disk. For example:
with io.open(some_path, 'w', encoding='utf-8') as the_file:
the_file.write(unicode(json.dumps(some_object, indent=2)))
with io.open(some_path, 'w', encoding='utf-8') as the_file:
the_file.write(unicode(yaml.dump(some_object, default_flow_style=False))) # From PyYAML
with io.open(some_path, 'w', encoding='utf-8') as the_file:
the_file.write(unicode(some_multiline_string)) # A simple string passed in, not explicitly marked up as unicode where it was declared
And of course the casts to unicode
fail under Python 3 because that type doesn't exist. If I change the casts so it's something like:
the_file.write(str(json.dumps(some_object, indent=2)))
then it works in Python 3, but fails under Python 2 because str
and unicode
are distinct, and file.write
needs a unicode parameter. While json.dumps
calls can be adapted to json.dump
calls that use the file directly, as far as I can tell the yaml dump calls can't.
Ideally there would be a way to coerce the types of all of the things being written to the type that file.write
wants (a unicode string), but I can't find what that is. I had hoped that you would always be able to decode
the various forms of non-unicode string into a unicode string, but str
objects in Python 2 don't appear to have a decode
function.
All of the other questions (here on Stack Overflow and elsewhere) and documentation I've found give conflicting advice, focus on buffer objects, or simply give advice on how to do it in one version of Python or the other. I need a solution which works equally in both Python 2.7 and 3.x, and I'm hoping that there is a graceful Python-esque solution that doesn't involve branching on a test that detects which version is in use.