1

Parsing strings with whitespace to integers changed from Python2 to Python3.

In Python2 it is:

>>> int('-11')
-11
>>> int('- 11')
-11

whereas in Python3:

>>> int('-11')
-11
>>> int('- 11')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '- 11'

Once I figured this out, I tried to find some explanations for/elaborations on this change in the docs, but couldn't find anything.

So my questions are: How to modify code to migrate from py2 to py3? Is i = int(s.replace(' ','')) the way to go? Or is there better advice? And is there some description of that change I just didn't find?

PVitt
  • 11,500
  • 5
  • 51
  • 85
  • Python 2.7 and 3.6 docs for int() both say "Optionally, the literal can be preceded by + or - (with no space in between) and surrounded by whitespace.". So I assume this only accidentally works in Python 2.7. – Bernhard Oct 17 '17 at 10:26
  • @Bernhard: the documentation was fixed *after* this bug was fixed (it was backported from 3 [to 2.7 only](https://github.com/python/cpython/commit/71d74b0c4ecc95aa7ba4d3255912d23142bcc4e3)). The bug was not backported to 2.6 however, as it could break existing code. – Martijn Pieters Oct 17 '17 at 10:42

3 Answers3

4

This was changed explicitly in Python 3 in response to issue 1779:

I discovered that when converting a string to an int or float, the int conversion allows whitespace after the sign, while the float conversion doesn't. I think they should be consistent.

This was noted in the 3.0a3 changelog (with a typo in the issue number):

  • Issue #1769: Now int("- 1") is not allowed any more.

Allowing spaces there was inconsistent with other numeric conversions.

The fastest way to fix this is to use str.replace(), yes:

>>> import timeit
>>> timeit.timeit('int("- 1".replace(" ", ""))')
0.37510599600500427
>>> timeit.timeit('int("- 1".translate(map))', 'map = {32: None}')
0.45536769900354557
>>> timeit.timeit('literal_eval("- 1")', 'from ast import literal_eval')
6.255796805999125
>>> timeit.timeit('int(extract("- 1"))', 'import re; from functools import partial; extract = partial(re.compile(r"[^\d\.\-]").sub, "")')
0.7367695900029503

The Python 2.7 documentation was updated after the fact, by backporting the Python 3 documentation. It now states explicitly that there should be no whitespace between sign and digits. So officially, whitespace is no longer supported, but in the interest of not breaking backwards compatibility, the bug is left in.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
2

It appear whitespaces in the string literals are not discarded in Python 3, however, the Python parser still neglects whitespaces when found in numeric literals:

>>> e = -   11
>>> e
-11

As such, you may use ast.literal_eval directly on the input string in both Python 2 and 3, so white spaces are disregarded:

>>> import ast
>>> ast.literal_eval('-      11 ')
-11
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
0

Without reinvent the wheel. For PY2 && PY3

import re
int(re.sub(r'[^\d\-]', '', '- 11'))

Tests

>>> int(re.sub(r'[^\d\.\-]', '', '- 11'))
-11
>>> int(re.sub(r'[^\d\.\-]', '', '+ 11'))
11
>>> int(re.sub(r'[^\d\.\-]', '', '+ 11easd'))
11
>>> int(re.sub(r'[^\d\.\-]', '', '+ 11easd3325'))
113325
Kamo Petrosyan
  • 214
  • 1
  • 9