19

What would be the complexity of converting a string to its equivalent number or vice versa? Does it change depending on programming language?

On the face of it, one needs to traverse the entire string to convert it to a number, so it is O(n), or is some typecasting used?

This doubt came about when I was writing a routine to check if a given number is a palindrome or not. One approach would be to keep dividing the number by the base (here 10), accumulate digits, and put them together at the end. Example: 309/10=rem(9), 30/10=rem(0), 3/10=rem(3). we get 903.

Another approach I took was to convert this number into a string, and since strings have loads of member functions to split, reverse etc., the code was a lot shorter and cleaner, but is this the best way to do this?

Svante
  • 50,694
  • 11
  • 78
  • 122
Srikar Appalaraju
  • 71,928
  • 54
  • 216
  • 264

5 Answers5

21

Numeric strings are numbers formatted in positional notation - so the value of each digit multiplied by a power of the base needs to be taken into account in order to convert the number into a binary format.

So yeah, it's an O(N) operation because the running time increases linearly as more digits are added. However, in practice N may be limited by whatever numeric data-types the language supports (e.g. int32_t, int64_t). But if arbitrary precision number types are used (which some languages, like Python, use by default) then there is no limit to the number of digits (other than available memory obviously).

Charles Salvia
  • 52,325
  • 13
  • 128
  • 140
4

To convert to a number you always have to read all digits. So it's at least O(n).

Now doing something like (pseudocode)

a = 0
foreach digit in string
do
   a = 10 * a + digit
end

Is O(n). So the complexity is O(n)

heijp06
  • 11,558
  • 1
  • 40
  • 60
  • 1
    strictly speaking, it's wrong. for `O(n)` notation, `n` must tend to infinity. – Vlad Dec 19 '10 at 13:58
  • 2
    @Vlad, N does tend towards infinity if arbitrary precision numbers are used. – Charles Salvia Dec 19 '10 at 14:00
  • @Charles: you are right. So this makes sense only for the languages with built-in arbitrary-precision numbers (this excludes C/C++ and C#). – Vlad Dec 19 '10 at 14:02
  • @Vlad: who said the bigints must be "built-in"? – Fred Foo Dec 19 '10 at 14:03
  • 2
    @Charles and @Vlad: but for arbitrary-precision this becomes O(n^2) since calculating the multiplication 10*a is O(n) operation. – Yakov Galka Dec 19 '10 at 14:07
  • 1
    @larsmans: the OP talks about the built-in conversions. The built-in conversions cannot work with non-built-in types. :-P – Vlad Dec 19 '10 at 14:09
2

If you're converting a number N to a string. It takes O(log(N)) with base 10. (If you divide by 10 and keep the remainder) If you're converting a string with length N, then it takes O(N). (If you use the algorithm whic keeps adding to your number 10^(N)*digit(N))

If you use functions which are not yours (let's say for string), you can only expect them to be slower.

synepis
  • 1,292
  • 16
  • 28
  • 3
    Both take O(N) where N is the number of digits. This is the proper way to analyze arithmetic algorithms. – Fred Foo Dec 19 '10 at 14:02
  • If the number is N=10^5. How many steps to convert it to a string? I think O(log(N)), because that is the number of digits. – synepis Dec 19 '10 at 16:44
1

C# and C/C++ don't have any special information in the strings which represents the (possible) numerical value. Therefore they need to parse the string digit by digit when converting.

However, the number of digits is limited, so we've got just O(1): the conversion time is bounded (usually by the conversion of the biggest number). For a 32-bit int, the conversion has to consider maximum of 10 decimal digits (and possibly a sign).

Conversion from string is actually O(1) as well, because during parsing of it, it's enough to consider only a bounded number of characters (10+1 in case of 32-bit int).

Strictly speaking, we cannot use O-notation for the case of int-to-string conversion, since the maximal value of int is bounded. Anyway, the time needed for conversion (in both directions) is limited by a constant.

As @Charles suggests, other languages (Python) actually can use arbitrary-precision numbers. For parsing such numbers, the time is O(number of digits), which O(string length) and O(log(number)) for both conversions, respectively. With arbitrary-precision numbers, one cannot do it faster, since for both conversions every digit must be considered. For the conversions to/from a limited-precision numbers, the same O(1) reasoning applies. However I didn't profile the parsing in Python myself, so maybe a less efficient algorithm is used there.


EDIT: following the @Steve suggestion, I checked that parsing in C/C++ and C# skips the initial whitespace, so the time for string->int conversion is actually O(input length). In case it's known that the string is trimmed, the conversion is again O(1).

Vlad
  • 35,022
  • 6
  • 77
  • 199
  • Of course we can use big O notation, we just get O(1) when not using bigints. – Fred Foo Dec 19 '10 at 14:05
  • @larsmans: I am explicitly speaking about C/C++ and C#, which don't have built-in bigints. – Vlad Dec 19 '10 at 14:07
  • @Vlad: for arbitrary-precision it's O((number of digits)^2), see the comment to @Peters answer. – Yakov Galka Dec 19 '10 at 14:11
  • @ybungalobill: I don't see why it's true. After all, it depends on the internal representation of bigints. If bigints consist of several chunks (corresponding to the representation N = n_0 + 10^k*n_1 + 10^2k*n_2 etc.), the multiplication by 10 can be very efficient. Did you measure the performance of parsing in Python? – Vlad Dec 19 '10 at 14:15
  • @Vlad: when you're talking about such a conversion you usually assume that the base of the string (10) is different from the base used by the internal representation. Otherwise it's possible to make O(1). Ad we are talking about complexity, not performance. – Yakov Galka Dec 19 '10 at 14:19
  • @ybungalobill: I would personally choose an internal representation which allows the typical operations to run fast (even sacrificing several bits). Parsing is definitely one of such operations. So, did you test the Python's performance yourself, or you have another source of information? – Vlad Dec 19 '10 at 14:24
  • @Vlad: What you say is that you prefer to avoid the conversion by storing the decimal string itself... Well, I'm sure that's what's done in some cases, but it's sort of cheating when answering this question. I don't use python, but some applications (e.g. GIMPS) must use binary representation, so there the complexity of the actual conversion is interesting. – Yakov Galka Dec 19 '10 at 14:30
  • @ybungalobill: not necessary the decimal representation. You can store chunks which have values in interval 0..10^10-1 (which fits into a 32-bit int. Anyway, the OP asks about not theoretical minimum (which seems to be better with my representation), but about the real performance of the language. – Vlad Dec 19 '10 at 14:34
  • @Vlad: "it's enough to consider only a bounded number of characters" - unless your conversion skips initial whitespace ;-) – Steve Jessop Dec 19 '10 at 14:43
  • @Steve: well, that's true. So, for C (sscanf) the performance is O(string size). :-( – Vlad Dec 19 '10 at 14:47
  • @Vlad: sure, but I was mostly joking. O(number of initial spaces in the string + constant) is still O(1) as long as the input doesn't have any whitespace to skip. If the input does have whitespace to skip then that's necessarily a linear-complexity operation whether we do it or our caller does. So skipping whitespace doesn't really "cost" anyone any complexity, just shifts it to us from our caller. – Steve Jessop Dec 19 '10 at 15:01
  • @Steve: well, but theoretically this indeed makes the performance `O(string size)`. Practically, there is a nice use case of trimmed strings :-) – Vlad Dec 19 '10 at 15:04
  • @Vlad: and in theory it's the theoretical performance that matters, but in practice it's the practical performance :-) – Steve Jessop Dec 19 '10 at 15:07
  • @Steve: ... and for the OP's case we can be sure that there are no leading spaces, since the string is obtained by converting from the number. – Vlad Dec 19 '10 at 15:09
1

I'm reasonably certain that dealing with pure numeric operators (in c++ and c# I think it would be the "%" modulus operator) is going to be more efficient if coded correctly because at some level you have to check for similar features (does the end match the beginning) and performing a conversion between string and number can only add to the complexity of the operation if you can do the same thing without performing that conversion.

That said, I would not worry about the performance impact of converting between numbers and strings because it's probably insignificant compared to the performance impact of most other areas of a program. Numeric types are limited to 64 bits, which puts a relatively low cap on the number of digits you can plan to parse anyway, unless you are implementing/using custom-coded large-number types.

You don't have to worry about the complexity being O(n) where n is the magnitude of the number. It would be more like O(n) where n is the number of digits (which has the low cap I mentioned) or (as mentioned in another reply) O(log(n)) if n is the magnitude of the number. Relatively insignificant performance impact.

Now if, as you suggest, you have no limit on N (which is not possible because with 2 GB of RAM you can only store numbers with up to 2 billion digits), then we might have to think more about the performance of performing mathematical operators. Consider the performance of the "%" and "/" operator on this large number type. But then realize that in order to convert the number to a string, it's basically using those same operators anyway. Once again, you can't beat handling it as a number directly if you do it right.

BlueMonkMN
  • 25,079
  • 9
  • 80
  • 146