how to convert Python 2 unicode() function into correct Python 3.x syntax

Question

I enabled the compatibility check in my Python IDE and now I realize that the inherited Python 2.7 code has a lot of calls to unicode() which are not allowed in Python 3.x.

I looked at the docs of Python2 and found no hint how to upgrade:

I don't want to switch to Python3 now, but maybe in the future.

The code contains about 500 calls to unicode()

How to proceed?

Update

The comment of user vaultah to read the pyporting guide has received several upvotes.

My current solution is this (thanks to Peter Brittain):

from builtins import str

... I could not find this hint in the pyporting docs.....

@vaultah this is not a general question. It is only about `unicode()` calls. I the code base which I currently work on, there are about 700 calls to this method. What should I do? — guettli, Aug 01 '16 at 13:57
There is no good answer to this question. If you're lucky, you can just remove the calls to `unicode` and you're good to go. All strings are unicode in Python 3. If this does not work, then expect *lots* of work. The transition from str to unicode literals and bytes is by far the most incompatible change when switching from Python 2 to 3. — Phillip, Aug 03 '16 at 15:07
Couldn't you just define your own `unicode()` function that does nothing but `return str(arg)` in Python 3? — martineau, Aug 03 '16 at 16:40
you can assign `str` to `unicode` - `unicode = str` (without parenthesis). It should work. — furas, Aug 03 '16 at 16:57
@guettli: rewrite those calls or provide your own `unicode` function to replace it. You'll have *more* issues with upgrading to Python 3 however. — Martijn Pieters, Aug 03 '16 at 19:08
@furas: except that `unicode()` in Python 2 accepts `str` objects without giving an explicit encoding (decoding implicitly as ASCII). In Python 3 passing in a `bytes` object will raise an exception. — Martijn Pieters, Aug 03 '16 at 19:11
@guettli For what purpose are you using `unicode()`? Please provide an example of your code where you're using `unicode()`. — Alastair McCormack, Aug 04 '16 at 08:06
@AlastairMcCormack yes, I could do 500 times an individual check why it is used. But first I want to have an no-brainer like solution. `from builtins import str` is such a no-brainer and later somebody might look at each usage in detail. But this will be an other question. — guettli, Aug 04 '16 at 10:24
@guettli You've used it 500 times!? That's exactly why I'm asking. The right answer is only useful if it addresses the actual problem. It sounds like you're using `unicode()` incorrectly and might be fixed with a simple solution that is safe for your data, safe for multiple languages and future proof. You should read [ask] and [mcve] and loose the attitude — Alastair McCormack, Aug 04 '16 at 17:21
@AlastairMcCormack Yes, the code contains it 500 times. But ... it's not "my code". It is the code laying before me today. — guettli, Aug 05 '16 at 11:01
@AlastairMcCormack what do you mean with "... and loose the attitude?" — guettli, Aug 05 '16 at 11:02
@guettli I meant "lose". I was trying to help but my request (and others) for a complete picture of the problem to help you better was met with a curt and dismissive attitude. Again, a full explanation of the problem, including the fact that you've inherited the code-base may yield better answers than just answering solution Y. — Alastair McCormack, Aug 05 '16 at 12:33
@AlastairMcCormack yes you are right. The code was developed by a team of ten people. I am one of them. It's not "my" code. I updated the question. — guettli, Aug 05 '16 at 13:00
@Qlstudio is that really a good idea? seems like your trying to hard-code it a little to hard. — Christian Dean, Aug 10 '16 at 06:18
Yes it is. Do you know about function definitions? Functions are like variables, and you can assign that function to another variable. — , Aug 10 '16 at 12:31

score 29 · Accepted Answer · answered Aug 03 '16 at 18:30

29

As has already been pointed out in the comments, there is already advice on porting from 2 to 3.

Having recently had to port some of my own code from 2 to 3 and maintain compatibility for each for now, I wholeheartedly recommend using python-future, which provides a great tool to help update your code (futurize) as well as clear guidance for how to write cross-compatible code.

In your specific case, I would simply convert all calls to unicode to use str and then import str from builtins. Any IDE worth its salt these days will do that global search and replace in one operation.

Of course, that's the sort of thing futurize should catch too, if you just want to use automatic conversion (and to look for other potential issues in your code).

answered Aug 03 '16 at 18:30

Peter Brittain

13,489
3
41
57

1

Yes, `futurize` will help transform the codebase; `unicode()` calls will be transformed to `str()` calls with a `from builtins import str` import at the top. Do take into account that generally does add an install-type requirement for the `future` library on Python 2 (to provide the backported `builtins` module). – Martijn Pieters Aug 03 '16 at 19:22
1

this will break sqlalchemy, among other libraries. – ben w Jun 20 '17 at 21:29
The "advice on porting from 2 to 3" mentions unicode a lot, but doesn't really mention the unicode function itself. – cowlinator Sep 20 '18 at 02:49
@cowlinator That's why I also referenced the python future docs. See http://python-future.org/compatible_idioms.html#unicode – Peter Brittain Sep 20 '18 at 06:12

score 16 · Answer 2 · answered Aug 03 '16 at 17:28

16

You can test whether there is such a function as unicode() in the version of Python that you're running. If not, you can create a unicode() alias for the str() function, which does in Python 3 what unicode() did in Python 2, as all strings are unicode in Python 3.

# Python 3 compatibility hack
try:
    unicode('')
except NameError:
    unicode = str

Note that a more complete port is probably a better idea; see the porting guide for details.

answered Aug 03 '16 at 17:28

Quint

1,146
12
12

Yes, this handmade solution should work. But I guess I will use the future library as explained in the answer by Peter Brittain. – guettli Aug 04 '16 at 07:25
1

very simple and useful, perfect solution for the asked question. also, no additional dependencies. – benzkji Aug 31 '18 at 13:35

score 10 · Answer 3 · edited Sep 27 '17 at 04:53

Short answer: Replace all unicode calls with str calls.

Long answer: In Python 3, Unicode was replaced with strings because of its abundance. The following solution should work if you are only using Python 3:

unicode = str
# the rest of your goes goes here

If you are using it with both Python 2 or Python 3, use this instead:

import sys
if sys.version_info.major == 3:
    unicode = str
# the rest of your code goes here

The other way: run this in the command line

$ 2to3 package -w

score 5 · Answer 4 · answered Aug 05 '16 at 06:01

First, as a strategy, I would take a small part of your program and try to port it. The number of unicode calls you are describing suggest to me that your application cares about string representations more than most and each use-case is often different.

The important consideration is that all strings are unicode in Python 3. If you are using the str type to store "bytes" (for example, if they are read from a file), then you should be aware that those will not be bytes in Python3 but will be unicode characters to begin with.

Let's look at a few cases.

First, if you do not have any non-ASCII characters at all and really are not using the Unicode character set, it is easy. Chances are you can simply change the unicode() function to str(). That will assure that any object passed as an argument is properly converted. However, it is wishful thinking to assume it's that easy.

Most likely, you'll need to look at the argument to unicode() to see what it is, and determine how to treat it.

For example, if you are reading UTF-8 characters from a file in Python 2 and converting them to Unicode your code would look like this:

data = open('somefile', 'r').read()
udata = unicode(data)

However, in Python3, read() returns Unicode data to begin with, and the unicode decoding must be specified when opening the file:

udata = open('somefile', 'r', encoding='UTF-8').read()

As you can see, transforming unicode() simply when porting may depend heavily on how and why the application is doing Unicode conversions, where the data has come from, and where it is going to.

Python3 brings greater clarity to string representations, which is welcome, but can make porting daunting. For example, Python3 has a proper bytes type, and you convert byte-data to unicode like this:

udata = bytedata.decode('UTF-8')

or convert Unicode data to character form using the opposite transform.

bytedata = udata.encode('UTF-8')

I hope this at least helps determine a strategy.

Great answer, which explains the importance of replacing `unicode()` properly — Alastair McCormack, Aug 06 '16 at 08:55

score 1 · Answer 5 · answered Jun 20 '22 at 16:40

1

You can use six library which have text_type function (unicode in py2, str in py3):

from six import text_type

answered Jun 20 '22 at 16:40

Alexey Shrub

1,216
13
22

how to convert Python 2 unicode() function into correct Python 3.x syntax

5 Answers5