Why is this code taking longer time than expected?

Question

Question: Given a list of unsorted elements, we have to find the length of longest consecutive elements sequence.

Expected time complexity: O(N)

for Ex: [4,7,1,100,28,2,3]

output: 4 since the longest consecutive elements sequence is [1,2,3,4] from the above list

I've written the code in Python. I've used set and map (defaultdict) to solve this problem.

from collections import defaultdict as maps
class Solution:
    def longestConsecutive(self, nums: List[int]) -> int:
        nums = set(nums); # to remove duplicates
        n = len(nums);
        if n == 0: return 0;
        if n == 1: return 1;
        present = maps(int);
        # Inserting values into map (defaultdict) as key
        for i in nums:
            present[i] = 1; # to check for presence of element
        i = 0; curr = 0; cnt, maxx = 0, 0; visset = {i for i in nums}
        while n > 0:
              #I had to use this while loop since the for-loop below will get interrupted if we make any changes to the iterating set.
            
            cnt = 1;
            for ind in visset:
                curr = ind; n -= 1; 
                visset.remove(ind); break; #remove is O(1) complexity for set
            # values less than curr::
            curr, temp = curr, curr;
            while n > 0:
                if present[curr - 1]:
                    curr -= 1; n -= 1; 
                    visset.remove(curr); cnt += 1; #remove is O(1) complexity for set
                else:
                    break;
            # values greater than curr::
            while n > 0:
                if present[temp + 1]:
                    temp += 1; n -= 1; 
                    visset.remove(temp); cnt += 1; #remove is O(1) complexity for set
                else:
                    break;
            maxx = max(cnt, maxx);
        maxx = max(cnt, maxx);
        return maxx

for more reference and runtime

Can anybody explain why this code's runtime is higher than intended ?

UPDATE: the runtime of this program is around 7000ms and if I replace set with a dictionary and use del dictt[i] instead of remove() in case of set, the runtime comes down to 2000ms

OT: what's up with all the semicolons? And docstrings instead of comments? — B Remmelzwaal, Apr 27 '23 at 16:18
This seems way complicated — surely a simple iteration would be faster? — Samwise, Apr 27 '23 at 16:20
Please include information necessary for understanding the question in the question, not in links. — khelwood, Apr 27 '23 at 16:22
@BRemmelzwaal well i use semicolon out of habit and it doesn't do any bad to my code. — Avinash Doddi, Apr 27 '23 at 16:28
@KennyOstrom updated code; I used set(nums) to remove duplicates — Avinash Doddi, Apr 27 '23 at 16:29
@khelwood I have added the problem statement and updated the code as well. — Avinash Doddi, Apr 27 '23 at 16:33
@KennyOstrom well, the question is to find the longest consecutive elements sequence. so i removed duplicates using set just in case. It doesn't really matter though. — Avinash Doddi, Apr 27 '23 at 16:34
Thanks for the update. The question makes sense now. It feels more like a stackreview question. There seem to be a bunch of extra maps that ... I don't think you need? And the maps themselves would have de-duped anyway. — Kenny Ostrom, Apr 27 '23 at 16:39
@KennyOstrom thankyou. the expected time complexity of this question was O(N) and i've tried to do the same. I just don't know why this takes a lotta time.. — Avinash Doddi, Apr 27 '23 at 16:41
That's something you probably should have mentioned first. I was thinking why not a naive sort and iterate? But that obviously wouldn't be O(n) — Kenny Ostrom, Apr 27 '23 at 16:42
I tried running it with range(10**i) for i from 4 to 7 and timing it and it appears to be running in O(N) time, so that suggests that your solution just has a worse O than the more optimal solutions. You can click on the time histogram to see how faster solutions were implemented. — Michael Cao, Apr 27 '23 at 17:00
@KellyBundy i can say so just by the performance of this code when i submit it. [view here](https://assets.leetcode.com/users/images/ca2d25d1-16b4-4cc0-91ca-765a4b763be3_1682610878.5933664.png) — Avinash Doddi, Apr 27 '23 at 17:11
Ah, so it takes 7 seconds? How long do the other solutions take? — Kelly Bundy, Apr 27 '23 at 17:12
@MichaelCao thanks. the reason why i asked this question was to understand the complexity differences. This code (Assuming i'm right and it's O(N)) is performing way worse than my O(N logN) code when submitted. — Avinash Doddi, Apr 27 '23 at 17:14
@KellyBundy yes, it takes around 7000+ ms, while for the same question i have initially solved this at O(N LogN) and it took around 333ms — Avinash Doddi, Apr 27 '23 at 17:16
Hmm, the reason is interesting enough that I decided to share it in a more focused [question](https://stackoverflow.com/q/76123245/12671057), but now I don't really know what to do with this one :-). I can imagine both are a duplicate, it's something I've known from another context. — Kelly Bundy, Apr 27 '23 at 17:44
@KellyBundy Interestingly enough, if i replace set and use dictionary in place of it and remove elements from dictionary using del d[i] the runtime of 7000ms comes down to 2000ms. I don't understand this ambiguous behavior of set since we take remove as O(1) — Avinash Doddi, Apr 27 '23 at 17:47
Yeah it's not the remove. Try using `collections.OrderedDict` (that's that other context where I know this from, and that's one reason I've sometimes used that instead of `dict`). — Kelly Bundy, Apr 27 '23 at 17:55
hint: If the list were a `set()`, then for any element of the set, could you tell me if that element +1 was also in the set? — JonSG, Apr 27 '23 at 17:59
@JonSG THAT is the reason why I have used a dictionary (defaultdict) call **present** to keep track of what elements are present in the set() — Avinash Doddi, Apr 27 '23 at 18:02
I got called away. My mistake, I see you're checking predecessor already. So I ran this on leetcode and it passed, unmodified. — Kenny Ostrom, Apr 27 '23 at 18:09
@KennyOstrom yeah, i did pass but as you might have noticed it was just a little quick than the runtime required for **time limit exceeded** — Avinash Doddi, Apr 27 '23 at 18:16
@JonSG I don't see how that's a hint. They're not even using any list in the slow part (after the initial setup). — Kelly Bundy, Apr 27 '23 at 18:21
@KellyBundy Perhaps their code is not an efficient attempt (given that they beleive there is an O(n) solution out there). I believe my hint would potentially have lead to a more efficient solution. — JonSG, Apr 27 '23 at 18:32
@JonSG I can't post anymore now that it's closed, could you perhaps put [this](https://dpaste.org/tUa1u) into your answer? Maybe make it community wiki. — Kelly Bundy, Apr 27 '23 at 19:07
@KellyBundy My answer was downvoted likely as it did not directly answer why/what part of the OP's code was slow. It was more of a "you might want a different approach" non-answer. So I just deleted it. Timing it, I believe it was the fastest solution but not really an answer. — JonSG, Apr 27 '23 at 19:17
@JonSG Yes, the question is explicitly asking for an explanation of their slowness, not for alternatives. Hence my downvote. If you add my explanation above it, I'd change it to an upvote. — Kelly Bundy, Apr 27 '23 at 19:22
@KellyBundy I undeleted and added the great advice you pointed to. I'm not sure my contribution really adds anything other than an afterthought though :-) — JonSG, Apr 27 '23 at 19:28
@JonSG As an afterthought, it's perfectly fine to include. And it's a good solution (except you only need to return the length). Also the most popular one [at LeetCode](https://leetcode.com/problems/longest-consecutive-sequence/discuss/41057/Simple-O(n)-with-Explanation-Just-walk-each-streak). — Kelly Bundy, Apr 27 '23 at 19:32

score 1 · Answer 1 · edited Apr 27 '23 at 20:30

It's slow on inputs like list(range(0, 200000, 2)). You do this 100000 times then:

for ind in visset:
    curr = ind; n -= 1; 
    visset.remove(ind); break; #remove is O(1) complexity for set

It's not the removal that's slow but the for loop iterating to find ind, as explained here. Takes quadratic time overall.

Use ind = visset.pop() instead, that's optimized for repeated removal of arbitrary elements. Or you could use a collections.OrderedDict instead of your set. Thanks to its additional linked list, it's fast for your style of find-one-and-remove-it.

Here is an alternate approach you might consider as well.

Building on the fact that we can test if a prior number and next number is in the list, we can build and keep track of runs of items.

def longest_consecutive(numbers):
    numbers = set(numbers)
    best_length = 0
    best_start = -1

    for number in numbers:
        ## ----------------
        ## this run is already/will be accounted for by some real 'start'
        ## ----------------
        if number - 1 in numbers:
            continue
        ## ----------------

        ## ----------------
        ## find consecutive numbers
        ## ----------------
        i = number
        while i in numbers:
            i += 1
        ## ----------------

        ## ----------------
        ## record the streak if better than what we have seen
        ## ----------------
        if i - number > best_length:
            best_length = i - number
            best_start = number
        ## ----------------

    return list(range(best_start, best_start + best_length))

based on timeit this will do lists of a million random integers in about a 1/4 of a second.

yeah, pop() method did the trick. Thanks a lot. will there be a problem in the while loops since I used remove() in them? can it be furthur optimized? — Avinash Doddi, Apr 28 '23 at 13:22
@AvinashDoddi Why would there be a problem with them? What kind of problem are you thinking about? — Kelly Bundy, Apr 28 '23 at 15:25
@KellyBundy Well, since for-loop previously slowed down the runtime of code, I thought that we could further optimize the while loop condition. — Avinash Doddi, Apr 28 '23 at 16:52
@AvinashDoddi Hmm, did you read the linked answer explaining *why* the for-loop was slow? Nothing like that is in the rest of the code. Sure, you can maybe micro-optimize, but with pop(), the solution is no longer quadratic time but already linear time, i.e., optimal complexity class. — Kelly Bundy, Apr 28 '23 at 17:07

Alain T. · Answer 2 · 2023-04-28T13:43:51.967

The problem may stem from for ind in visset. I observed this before (for dictionaries) but never bothered to measure it. There seems to be some overhead involved in iterating over a set after removing elements from it.

from time import time

s = set(range(1_000_000))
x = set(range(1_000_000))
start = time()
for i in range(100_000):
    x.remove(i)
    for i in s: break
print(time()-start)  # 0.017419099807739258


start = time()
for i in range(100_000):
    s.remove(i)
    for i in s: break
print(time()-start)  # 2.6430912017822266

Perhaps a solution without a for-loop would run faster

# using pop to get a random number out of the set
# instead of the for-loop-break
curr = 1
if visset:
   curr = visset.pop()
   n   -= 1

Or a different algorithm altogether to avoid removing items completely:

def longestConsec(nums):
    forward  = dict(zip(nums,nums))
    backward = forward.copy()
    for n in forward:
        f = forward.get(n+1,n)
        b = backward.get(n-1,n)
        forward[b]  = f
        backward[f] = b            
    return max(f-b+1 for b,f in forward.items())

but, wouldn't the iteration stop when n becomes 0? I mean even though the element is present in **present** dictionary, we are basically removing the element from the set and decrementing n till 0. so wouldn't it just take N iterations ? — Avinash Doddi, Apr 27 '23 at 17:31
The interesting thing is that, when i replace the set with defaultdict, the runtime which was around 7000ms was reduced to around 2000ms — Avinash Doddi, Apr 27 '23 at 17:32
I have tried as you said but that didn't make much difference in the overall runtime. — Avinash Doddi, Apr 27 '23 at 18:06

score -1 · Accepted Answer · answered Apr 27 '23 at 18:18

Yours passes unmodified on leetcode, but it was on the slower end. Here's a simplification of the same code that does a little better than average:

class Solution:
    def longestConsecutive(self, nums: List[int]) -> int:
        present = set(nums)
        curr = 0
        cnt, maxx = 0, 0
        while present:
            cnt = 1
            start = curr = present.pop()
            while start - 1 in present:
                start -= 1
                cnt += 1
                present.remove(start)
            while curr + 1 in present:
                curr += 1
                cnt += 1
                present.remove(curr)
            maxx = max(cnt, maxx)
        return maxx

You did have a lot of unnecessary data structures, and dumplicate maps where the value wasn't even used. I replaced them all with one set.

Now this is hella good. But I still can't understand why mine was a lot slower. Thanks though. — Avinash Doddi, Apr 27 '23 at 18:24

Why is this code taking longer time than expected?

3 Answers3

Linked