Is using `str` the correct idiom for working with digits in Python

Question

I understand that one way to work with the digits of a number in Python is to convert the number to a string, and then use string methods to slice the resulting "number" into groups of "digits". For example, assuming I have a function prime that tests primality, I can confirm that an integer n is both a left and right truncatable prime with

all(prime(int(str(n)[:-i])) and prime(int(str(n)[i:])) for i in range(1, len(str(n))))

This method involves first converting n to a string so that it can be sliced, and converting that slice back to an integer so that its primality can be checked. Perhaps it's my history with statically typed languages, or some vague idea that strings are "expensive", or experience with languages that include builtin functionality for similar manipulations (like Mathematica's IntegerDigits and FromDigits); but I'm left wondering whether this is the right way to go about such tasks.

Is conversion back and forth between stings and numbers the correct — or even only — approach for accessing digits in Python. Are there more efficient approaches?

Out of curiosity, is this for problems on [Project Euler](http://projecteuler.net)? I love that site! — Jud, Nov 15 '13 at 23:29
For very large numbers and/or numerous repeats, one way to cut expense is to make a variable that holds the string version of the number. That way, you don't call `str` twice. — , Nov 15 '13 at 23:29
Python is strongly typed - it's usually not manifestly or statically typed though. — dstromberg, Nov 15 '13 at 23:40
@dstromberg: Nothing inherently. The question is what's idiomatic, and what's efficient. — orome, Nov 15 '13 at 23:48
For efficiency, just do modulo and integer division by 10. No need to use `str()`. `str()` is one of the poisons of Python, it's one of the slowest operations there is. Same for converting a str type to an int with `int()`. — Shashank, Nov 15 '13 at 23:53
For left truncatable prime, you'll need to subtract the largest power of 10 that is less than that number multiplied by the leftmost digit. EX: if you have the number 6349, the largest power of 10 is 10^3 = 1000, You take 1000 * 6 since 6 is the last digit and you get 6000. 6349 - 6000 = 349. All that is necessary is one log(n) testing to see what the greatest power of 10 is that is less than your given number. Then you can just store that number to avoid repeated computations. EX: After you get 349 from 6349, you know that the next highest power of 10 is 10^2. — Shashank, Nov 16 '13 at 00:01
@ShashankGupta: So it looks like my answer may bifurcate: `str`, and the approach taken above, seems to be the most Pythonic, but it's "poison" and very slow. What do you think of [Blckknght's answer](http://stackoverflow.com/a/20012696/656912)? Can you work your comment above into a new answer? — orome, Nov 16 '13 at 00:09
@raxacoricofallapatorius, I edited my original response to give a fully general solution that involves the mod/div technique to extract any subset of consecutive digits. I also added timeit results. As you can see, if you're doing anything more than simple single digit extractions, the str() method isn't *that* bad. I certainly don't think it's "poison". — Jud, Nov 16 '13 at 00:19
@raxacoricofallapatorius I added an answer for testing left truncatable primes using only math operations on integers. :) Also what I mean by `str()` being poison is that its a slow operation and should be used minimally. I'm not saying never use it! I'm just saying if you can use math instead, use math :P Try to avoid the "poison". Obviously if you are aiming for simplicity of code, str() is the way to go, but if you want speed, it's generally not. — Shashank, Nov 16 '13 at 00:30

score 6 · Answer 1 · answered Nov 15 '13 at 23:52

6

In your example code, you could get away using divmod rather than string slicing the digits. divmod(x, y) returns the tuple x//y, x%y, which for y values that are 10**i is exactly what you want for the left and right pieces of your number. This isn't necessarily more Pythonic, though it might be a bit faster.

sn = str(n)
all(prime(int(sn[:i])) and prime(int(sn[i:])) for i in range(1, len(sn))) # original
all(all(map(prime, divmod(n, 10**i))) for i in range(1, len(sn))) # variant using divmod

I think for more general digit operations, using str is probably pretty sensible, as doing lots of math on powers of your numerical base is likely to be harder to understand than doing stuff directly on the digits in a string.

Write code to be read, unless it's really performance sensitive.

answered Nov 15 '13 at 23:52

Blckknght

100,903
11
120
169

Another great answer. I'm impressed by how much I've learned from this question in just a few minutes! – orome Nov 15 '13 at 23:59
I like this. It is a fast and idiomatic way to partition a number into a right part and a left part by number of available digits. – steveha Nov 16 '13 at 00:18
"Write code to be read, unless it's really performance sensitive." - Wise words! – Jud Nov 16 '13 at 00:20
@steveha: Note that I cheated a little bit in the divmod variant by using the length of the string from the first version to determine the range to iterate over. You can find the number of digits with just math operations too, but it's an ugly thing: `int(math.ceil(math.log10(n+1)))` (though it looks like the `int` call is not needed in Python 3). – Blckknght Nov 16 '13 at 02:08
Using `str` to get the number of digest is hardly cheating. If anything I think it reinforces the general appropriateness of using `str` in problems like this one. – orome Nov 16 '13 at 15:21
Interestingly, using these as alternatives in the context of the full algorithm (not reproduced here, it is indeed -- @Jud -- a Project Euler problem) results in little difference in performance (6.4s vs 6.0s on an old iMac) — but the `divmid` version is slower with `map` responsible for almost all the additional time. Also, oddly, [I see no calls to `str` in the output generated by `cProfile.run`](http://stackoverflow.com/q/20023602/656912), in any version (the above two or the original), so I'm not sure how `str` impacts performance. – orome Nov 16 '13 at 20:40
1

Which is why I advocate using str() when doing digit manipulation. There might be specific cases where you can beat it with a mathematical approach performance-wise, it provides a much clearer and more general solution. – Jud Nov 16 '13 at 20:47
Also, I have a BS in Mathematics so I'm hardly biased against it. :) – Jud Nov 16 '13 at 20:48

score 4 · Answer 2 · answered Nov 16 '13 at 01:07

Python's native integers are stored in a power-of-2 base, so it takes real work to convert them to or from a decimal notation. In many kinds of "puzzle" ;-) problems, which require frequent access to the decimal digits of very large integers, it can make a world of difference to use the decimal module instead. That stores values in a power-of-10 base, so that "converting" to/from decimal digits is a trivial expense.

>>> import decimal
>>> x = decimal.Decimal('1e20') - 1
>>> x
Decimal('99999999999999999999')
>>> x.as_tuple().digits
(9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9)

That takes time linear in the number of digits. Converting a native integer to/from decimal takes time quadratic in the number of digits.

Best I can guess about your specific application here, though, using divmod() is really the best approach.

steveha · Answer 3 · 2013-11-16T01:48:41.857

2

How about this?

def digits(n):
    if n == 0:
        yield 0
    else:
        while n:
            yield n % 10
            n //= 10

for digit in digits(123):
    # do something with digit

This should be both more convenient and more efficient than the example you showed.

EDIT: I want to add two more things.

0) You can extend the technique as needed. Suppose you want to test for right-truncatable primes. Assuming you have a function is_prime():

def right_trunc_int_values(n):
    if n == 0:
        yield 0
    else:
        while n:
            yield n
            n //= 10

assert(all(is_prime(n) for n in right_trunc_int_values(317))

1) To solve the general problem of conveniently working with digits, you might be able to use the decimal module. I'll look into this more. But meanwhile, you can use my digits() function to make an indexable list of digits:

d = list(digits(123))
print(d[2])  # prints 2

EDIT: It's pretty easy to convert a series of digits to an integer value.

def int_from_digits(digits):
    result = 0
    found_a_digit = False
    for digit in reversed(digits):
        result = result * 10 + digit
    return result 

def is_right_trunc_prime(n):
    d = list(digits(n))
    return all(is_prime(int_from_digits(d[:i]) for i in range(len(d), -1, -1)))

# example from question of left and right truncatable check
d = list(digits(n))
all(prime(int_from_digits(d[:-i])) and prime(int_from_digits(d[i:])) for i in range(1, len(d)))

edited Nov 16 '13 at 01:48

answered Nov 15 '13 at 23:47

steveha

74,789
21
92
117

This is clearly useful in any cases where digits are wanted on their own, but it's not clear to me how to incorporate this into a case like the example in the question where a sequence of digits needs to be converted back to a single number. – orome Nov 15 '13 at 23:57
@raxacoricofallapatorius, have I answered your question adequately? P.S. I had to post all the above without testing it. If I have made any mistake, please let me know and I'll fix it for you. – steveha Nov 16 '13 at 00:25
It's all very well saying `digits` "should" be more efficient than `str`, but according to `timeit`, `str(1234567890)` is nearly 4 times as fast as `list(digits(1234567890))` on my machine (0.923 us vs 3.55 usec). Presumably because any loop in Python is liable to be slow compared with a builtin function implemented in C. Of course you'd want to test the real code before choosing which to use, the conversion isn't the whole performance cost. – Steve Jessop Nov 16 '13 at 01:03
@SteveJessop, I am not at all surprised that `str()` is a faster operation. However, did you also account for the overhead of slicing the string and then converting a digit back to an `int`? Once you have paid that up-front cost to get a digits list, you can simply index it to get each digit value, and I predict that `d[i]` when `d` is a digits list will be faster than `int(s[i])` where `s` is a `str`. I predict it won't take very much processing for the digits list to pay for itself. However, as you noted, in performance one must always measure rather than guess. – steveha Nov 16 '13 at 01:16
@steveha: I did two other tests so far. One compared converting int -> str -> int vs just getting the digits. int -> str -> int was still twice as fast, even though it's doing more of the problem. I didn't test a string slice, I just guessed it would be the same or better than a list slice so I didn't bother. The other test I did was of Shashank's code, so not relevant to yours. If I come back to this question tomorrow I might do a "timeit shootout" answer. It's true that the need to convert each digit again would add to the cost of `str` for problems that need it: this one happens not to. – Steve Jessop Nov 16 '13 at 01:23
@SteveJessop my own tests confirm your results. The ratio I saw was 6:10, so the straight string manipulation is about 40% faster, once again showing that in performance, guessing isn't as good as measuring. The remaining question is whether the convenience is worth the overhead. – steveha Nov 16 '13 at 01:46

Jud · Accepted Answer · 2013-11-18T03:17:16.120

This has always been my approach and it's worked fine, though I've never done much testing for speed. It works particularly well when needing to iterate over permutations/combinations of digits, since you can build up such strings with the functions in the itertools package.

There are certainly other methods that involve less straightforward mathematical operations, but unless speed is absolutely crucial I feel the string method is the most Pythonic.

Here, for example, is a more mathematical approach, where a and b are indexed from the right (ie ones place is 0, tens place is 1, etc):

def getSubdigits(n, a, b):
    n %= 10 ** a
    n //= 10 ** b
    return n

For this to work with the same indexing as string slicing, you'd need to find the total number of digits first, and the function becomes:

def getSubdigits2(n, a, b):
    l = int(math.ceil(math.log10(n)))
    n %= 10 ** (l - a)
    n //= 10 ** (l - b)
    return n

And the string slicing equivalent:

def subDigits3(n, a, b):
    return int(str(n)[a:n])

Here's timing results:

subDigits: 0.293327726114
subDigits2: 0.850861833337
subDigits3: 0.990543234267

My takeaway from that result is that the slicing method is fine unless you really care about speed, in which case you need to use the first method and think about the indices in the other direction.

Yes, I'm assuming that in some cases there may be some mathematics that will change the problem, but assuming that's been taken care of and I've reduced the mathematics, the question here is whether this is the Pythonic (and perhaps only feasible) approach. — orome, Nov 15 '13 at 23:39
Now you've got me genuinely curious (since I do this kind of fiddling all the time). Let me whip up a general solution using simple arithmetic and see what I find. — Jud, Nov 15 '13 at 23:42
Why not just `n = n % 1000` instead of `n -= (n // 1000) * 1000`? — Shashank, Nov 15 '13 at 23:49
@Jud: I introduced a small typo — "fore" for "for" — in my edit (which I can't correct). — orome, Nov 18 '13 at 03:04

Shashank · Answer 5 · 2013-11-16T03:21:56.263

1

Testing left truncatable primes without str() and slicing:

def is_prime(n):
    if n < 2:
        return False
    elif n == 2:
        return True
    elif n % 2 == 0:
        return False
    return all(n % x for x in xrange(3,int(pow(n,0.5))+1,2))

def is_left_truncatable_prime(n):
    largest_power_of_ten = 1
    while largest_power_of_ten < n:
        largest_power_of_ten *= 10
    while True:
        largest_power_of_ten /= 10 # Use // in Python 3
        if is_prime(n):
            n %= largest_power_of_ten
            if n == 0:
                return True
        else:
            return False

print is_left_truncatable_prime(167) # True
print is_left_truncatable_prime(173) # True
print is_left_truncatable_prime(171) # False

I haven't extensively tested this so sorry if there are any errors. Let me know if there are and I will fix them.

EDIT: Fixed up the code a bit.

edited Nov 16 '13 at 03:21

answered Nov 16 '13 at 00:29

Shashank

13,713
5
37
63

Hmm. On my machine, using `173` as the test and changing the questioner's code to use your `is_prime` function and to only test left-ness rather than both left-ness and right-ness, this code is about 5% slower than the questioner's. If that's borne out on other machines, other versions of Python and for other values (which of course is far from settled by my single test), then `str` would be preferred on all counts: performance, brevity, readability. – Steve Jessop Nov 16 '13 at 01:18
@Steve Well his algorithm is O((log_10(n))^2) since slicing is a linear time operation and he slices every number in the sequence (1+2+...+log(n)).From the [time complexity page for Python](https://wiki.python.org/moin/TimeComplexity) you can see that "Get Slice" is O(k) where k is the length of the slice. So yes maybe my function is 5 percent slower for 173 but I think it would be faster for large numbers since its O(log_10(n)) instead of O((log_10(n))^2). Maybe Python defies me though :) – Shashank Nov 16 '13 at 02:25
It should be a bit faster now. I fixed some of my logic which was equivalent to simply `n %= largest_power_of_ten`. – Shashank Nov 16 '13 at 03:23

Is using `str` the correct idiom for working with digits in Python

5 Answers5