0

I'm trying to find the length of a string by comparing the string to different strings. Python compares strings as follows:

if (op == Py_EQ) {
    /* Supporting Py_NE here as well does not save
       much time, since Py_NE is rarely used.  */
    if (Py_SIZE(a) == Py_SIZE(b)
        && (a->ob_sval[0] == b->ob_sval[0]
        && memcmp(a->ob_sval, b->ob_sval, Py_SIZE(a)) == 0)) {
        result = Py_True;
    } else {
        result = Py_False;
    }
    goto out;
}

The way I see it (maybe I'm wrong), it is suppose to take less time to compare strings with different lengths than strings with the same length. I've built this function:

def find_length(string, possible_length = xrange(1, 33)):
    l = []
    for i in possible_length:
        temp = '*' * i
        l.append(timeit.timeit(lambda: temp == string, number=10**5))
    return l.index(max(l)) + 1

And when using it like this: print find_length('test') I was expecting to get the result of 4, but instead I got (after I ran it 5 times): 20, 10, 26, 22, 8. First I thought that perhaps because I'm dealing with such short times 10^5 isn't enough but it gave the same results (not 20,10,26... but inconsistent results as well). Does anyone find a mistake in my code / logic?

pystudent
  • 531
  • 1
  • 5
  • 19
  • 1
    Yes, your test string is *way too short* to say anything meaningful about a speed difference. – Martijn Pieters Jul 23 '15 at 17:47
  • @MartijnPieters And there isn't a way to do this anyway? Because It's really important in order for me to achieve what I've been trying to do in the last couple of days... – pystudent Jul 23 '15 at 17:53
  • To achieve *what* exactly? To show that string comparisons of differing length are fast vs. comparing strings of equal length? – Martijn Pieters Jul 23 '15 at 17:57

1 Answers1

0

You are testing way too short a string to say anything useful about a speed difference between testing for the length and the contents. Moreover, your temp string can be seen to be unequal by testing the first character.

So you either test just len(temp) == len(string), and otherwise you only add a test to see if temp[0] == string[0]. That's a very small difference indeed.

Use a long random string, and use slices of that same string to test against, so that they are almost equal:

>>> import random
>>> import string
>>> from timeit import timeit
>>> target = ''.join(random.choice(string.ascii_letters) for _ in range(10000))
>>> almost_equal = target[:-1]
>>> equal = almost_equal + target[-1]
>>> timeit(lambda: target == almost_equal)
0.11822915077209473
>>> timeit(lambda: target == equal)
0.48569512367248535

Now the difference is between testing a 10000 character string against one that is 9999 characters long, and is equal except for one missing character at the end, versus testing against an exactly equal string. The second test takes more than 4 times as long.

By testing for an equal string you hit the worst-case scenario; each and every character of those 10000 have to be compared to determine that the strings indeed have the same value.

If you were hoping to deduce a string length by testing it against increasingly long strings and finding the comparison that took longest, then you simply can't. There is not enough difference between just testing the length and determining that an equally long string differs in value, at least not when it is trivial to detect that they are not equal from the first character(s) alone.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343