2

Using Python 3.

I have a string such as 128kb/s, 5mb/s, or something as simple as 42!. There's no space between the numeric characters and its postfix, so I can't just invoke int(text) directly.

And I just want to capture the values of 128,5, and 42 into an integer.

At the moment, I just wrote a helper function that accumulates all the numbers into a string and breaks on the first non-numeric character.

def read_int_from_string(text):
    s = ""
    val = 0
    for c in text:
        if (c >= '0') and (c <= '9'):
            s += c
        else:
            break
    if s:
        val = int(s)
    return val

The above works fine, but is there a more pythonic way to do this?

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
selbie
  • 100,020
  • 15
  • 103
  • 173

2 Answers2

5

This is one of those scenarios where a regex seems reasonable:

 import re

 leadingdigits = re.compile(r'^\d+')

 def read_int_from_string(text):
     return int(leadingdigits.match(text).group(0))

If you hate regex, you can do this to basically push your original loop's logic to the C layer, though it's likely to be slower:

 from itertools import takewhile

 def read_int_from_string(text):
     return int(''.join(takewhile(str.isdigit, text)))
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
3

you can use str.isdigit, how about this one?

>> int(filter(str.isdigit, '128kb/s'))
   128

for Python 3. since filter returns iterable in Python 3

int(''.join(filter(str.isdigit, '128kb/s')))
Asav Patel
  • 1,113
  • 1
  • 7
  • 25
  • Flaw with that approach is that it will keep going when the digits stop, instead of breaking; if there are digits later in the string, they're all silently grouped together (so `"128 foo/10bar"` is parsed as `12810`, where the original code would get `128`). You could fix it by using `itertools.takewhile` (wrapped in `''.join` on all Python versions, which you'd need with `filter` too on Python 3) instead of `filter`, but it's still going to be somewhat slow. – ShadowRanger Dec 22 '16 at 19:56
  • Side-note: Don't wrap `filter` in `list` on Py3 if you're just going to `''.join` it anyway; `''.join` will take any arbitrary iterable, so `list` wrapping is pointless busywork. – ShadowRanger Dec 22 '16 at 19:58
  • @ShadowRanger yes you are right about wrapping `filter` in `list`. and if you don't want all numbers joined together you can always use `int(list(filter(str.isdigit, '128kb/100s))[0])` which returns `128` – Asav Patel Dec 22 '16 at 20:01
  • Umm... No. `int(list(filter(str.isdigit, '128kb/100s))[0])` returns `1`, because the predicate is applied on a character by character basis, it doesn't group runs of a given type. The `list` would be just `['1', '2', '8', '1', '0', '0']` with no contextual information to determine where the first run of digits ended. `itertools.groupby` could give you the necessary contextual information, but it's a fairly heavyweight solution that's unnecessary since `itertools.takewhile` covers the required behaviors anyway. – ShadowRanger Dec 22 '16 at 20:07