1

I have these three solutions to a Leetcode problem and do not really understand the difference in time complexity here. Why is the last function twice as fast as the first one?

68 ms
def numJewelsInStones(J, S):
    count=0
    for s in S:
        if s in J:
            count += 1
    return count
40ms
def numJewelsInStones(J, S):
    return sum(s in J for s in S)
32ms
def numJewelsInStones(J, S):
    return len([x for x in S if x in J])
Community
  • 1
  • 1
krenkz
  • 484
  • 6
  • 15
  • You usually don't measure time performance with only one experiment, but compute the average over several runs to take into account any random distortion. Also, what is J? The `in` operation can have different complexities depending on the data structure. – user2314737 Dec 22 '18 at 17:25

3 Answers3

4

Why is the last function twice as fast as the first one?

The analytical time complexity in terms of big O notation looks the same for all, however subject to constants. That is e.g. O(n) really means O(c*n) however c is ignored by convention when comparing time complexities.

Each of your functions has a different c. In particular

  • loops in general are slower than generators
  • sum of a generator is likely executed in C code (the sum part, adding numbers)
  • len is a simple attribute "single operation" lookup on the array, which can be done in constant time, whereas sum takes n add operations.

Thus c(for) > c(sum) > c(len) where c(f) is the hypothetical fixed-overhead measurement of function/statement f.

You could check my assumptions by disassembling each function.

Other than that your measurements are likely influenced by variation due to other processes running in your system. To remove these influences from your analysis, take the average of execution times over at least 1000 calls to each function (you may find that perhaps c is less than this variation though I don't expect that).

what is the time complexity of these functions?

Note that while all functions share the same big O time complexity, the latter will be different depending on the data type you use for J, S. If J, S are of type:

  • dict, the complexity of your functions will be in O(n)
  • set, the complexity of your functions will be in O(n)
  • list, the complexity of your functions will be in O(n*m), where n,m are the sizes of the J, S variables, respectively. Note if n ~ m this will effectively turn into O(n^2). In other words, don't use list.

Why is the data type important? Because Python's in operator is really just a proxy to membership testing implemented for a particular type. Specifically, dict and set membership testing works in O(1) that is in constant time, while the one for list works in O(n) time. Since in the list case there is a pass on every member of J for each member of S, or vice versa, the total time is in O(n*m). See Python's TimeComplexity wiki for details.

miraculixx
  • 10,034
  • 2
  • 41
  • 60
3

With time complexity, big O notation describes how the solution grows as the input set grows. In other words, how they are relatively related. If your solution is O(n) then as the input grows then the time to complete grows linearly. More concretely, if the solution is O(n) and it takes 10 seconds when the data set is 100, then it should take approximately 100 seconds when the data set is 1000.

Your first solution is O(n), we know this because of the for loop, for s in S, which will iterate through the entire data set once. If s in J, assuming J is a set or a dictionary will likely be constant time, O(1), the reasoning behind this is a bit beyond the scope of the question. As a result, the first solution overall is O(n), linear time.

The nuanced differences in time between the other solutions is very likely negligible if you ran your tests on multiple data sets and averaged them out over time, accounting for startup time and other factors that impact the test results. Additionally, Big O notation discards coefficients, so for example, O(3n) ~= O(n).

You'll notice in all of the other solutions you have the same concept, loop over the entire collection and check for the existence in the set or dict. As a result, all of these solutions are O(n). The differences in time can be attributed to other processes running at the same time, the fact that some of the built-ins used are pure C, and also to differences as a result of insufficient testing.

Kevin S
  • 930
  • 10
  • 19
0

Well, second function faster than first because of using generator instead of loop. Third function is faster than second because second summing generators output (which returns something like list), but third - just calculating it's length.

Anlis
  • 769
  • 3
  • 9