1

I am working on migrating an old Python code base to Python3.

There are many strings which have the "u" prefix. Example u'Umlaut üöö'

Is there an automated way to remove the leading "u"?

A simple regex is does not work:

u'schibu': u' at the end must not get removed.

Example2:

Multiline: '''foo

schibu'''

Is there maybe a way which works without a regex, but via parsing the python syntax?

Update

My code needs to be compatible with Python2 and Python3 for some months.

The files already contain from __future__ import unicode_literals

guettli
  • 25,042
  • 81
  • 346
  • 663
  • 2
    Why bother? The `u` prefix doesn't break anything. – user2357112 Jan 31 '22 at 10:21
  • 1
    There is no reason to remove the `u` prefix. It was added back to Python in [version 3.3 for compatibility reasons](https://www.python.org/dev/peps/pep-0414/). Leave it as is, then once the code has been definitely migrated from 2 to 3, you can remove it later on (a formatter like Black even does that for you, for example). – 9769953 Jan 31 '22 at 10:54

1 Answers1

3

Using 2to3 tool unicode fixer should do that.

unicode

Renames unicode to str.

Dry run with sample spam.py file

eggs = u'foo'

in shell:

$ 2to3 --fix unicode spam.py

output

root: Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
RefactoringTool: Refactored spam.py
--- spam.py     (original)
+++ spam.py     (refactored)
@@ -1 +1 @@
-eggs = u'foo'
+eggs = 'foo'
RefactoringTool: Files that need to be modified:
RefactoringTool: spam.py

EDIT: Note, you can run just a single fixer as shown above (in a dry run) and it will apply only the respective change.

buran
  • 13,682
  • 10
  • 36
  • 61
  • My code needs to be compatible with Python2 and Python3 for some months. AFAIK this means I can't use 2to3. – guettli Jan 31 '22 at 10:39
  • You can run just a single fixer - `unicode` and it will change only this. But if you need to support both 2 and 3 for whatever reason, are you sure that removing `u` prefix wouldn't break the python2 code - i.e. I would expect it is there for reason. – buran Jan 31 '22 at 10:41