2

I am learning python slice operations and I decided to write a simple function that iterates through a string with a window of size k and adds the window to the dictionary along with its frequency. So for example if the string input is "abab" and k is 2, the dictionary will contain "ab:2" and "ba:1".

I am confused what will be the time complexity of this function:

In the code, s is input string and k is window size.

def test_func(s, k):
    d = {}
    for i in range(len(s) - k + 1):
        sub_str = s[i:i+k]
        if sub_str in d:
            d[sub_str] += 1
        else:
            d[sub_str] = 1
    return d

I am thinking that time complexity of it will be O(n * k) and space complexity of it will be O(n) where n is size of list and k is size of window but I am not sure if it is right. Can you please review the function and let me know if my analysis is correct? Thank you!

  • 1
    Time and space are both `O(n*k)` – Barmar Mar 03 '22 at 01:12
  • 1
    @Barmar I fail to see how the space complexity is *O(n x k)*. The two variables for storing the sliced string and a dictionary of frequencies require only space proportional to the window size and the number of distinct substrings, respectively. The size of the string itself is never a factor in the space complexity. – blhsing Mar 03 '22 at 01:16
  • 1
    If there are no duplicate slices, there will be `n*k` different keys in the dictionary. – Barmar Mar 03 '22 at 01:18
  • 1
    @Barmar Well yes but that's the worst case scenario. Usually when asked without further qualification complexity is meant to refer to the average case. – blhsing Mar 03 '22 at 01:20
  • 2
    The average case is that half the slices are unique. Dividing by a constant doesn't change big-O. – Barmar Mar 03 '22 at 01:28

1 Answers1

1

Time and space should both be O(n*k).

Looking up a dictionary key of size k is O(k) because you have to hash k, which requires reading all the characters of k. While we often treat dictionary and set lookup as amortized constant time, I don't think we can use that simplification when the dictionary key size is one of the parameters.

Since you do these lookups O(n) times, the time complexity is O(n*k).

Since all the keys have to be stored in the dictionary, and the worst case is that there are no duplicate slices, the dictionary may have to contain n*k keys.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thanks, this is very helpful! Is it possible to reduce the time complexity for this function to O(n)? – newbie_coder Mar 03 '22 at 01:23
  • 1
    I don't think so. You could use your own hash function that hashes a fixed number of characters, but that's likely to increase collisions, resulting in worse performance of the hash table. There's no free lunch. – Barmar Mar 03 '22 at 01:30