6
def filter(f, lst):     
    if lst == []: return []
    if f(lst[0]): return [lst[0]] + filter(f, lst[1:])
    return filter(f, lst[1:]) 

def my_reverse(lst):        # Reverse the list
    def reverse_helper(x,y):
        if x == []: return y
        return reverse_helper(x[1:], [x[0]] + y)
    return reverse_helper(lst, []) 

def revfilter_alpha(f, lst):    # Reverse and filter ...
    return my_reverse(filter(f, lst))

def revfilter_beta(f, lst): # Reverse and filter ...
    if lst == []: return []
    return revfilter_beta(f, lst[1:]) + ([lst[0]]  if f(lst[0])  else [])   

Could someone explain to me how to determine the running time in Big Θ notation for these? I've read quite a few things but still have no idea where to begin.

In filter, I think it is Θ(n^2) because it checks each element in list of size n with the predicate function f with n recursive calls so n*n.

revfilter_beta looks pretty similar just reversing while filtering so wouldn't this also be Θ(n^2)?

revfilter_alpha does filter then a reverse, so wouldn't this be n^2*n^2 = Θ(n^4)?

Does anyone have any thoughts?

inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
atkayla
  • 8,143
  • 17
  • 72
  • 132
  • How many recursive calls are made in `filter`? Is it really `n * n`? – Ja͢ck Oct 11 '12 at 01:37
  • btw, as `filter` is just as much a built in as `reverse` you might as well call it `my_filter` as well – Claudiu Oct 11 '12 at 01:46
  • Now I'm not sure whether I was correct or not to begin with... :P – atkayla Oct 11 '12 at 01:58
  • note: it is usually called [Big O notation](http://en.wikipedia.org/wiki/Big_O_notation). `Θ(n)` provides tighter bounds than `O(n)`. – jfs Oct 11 '12 at 03:49

2 Answers2

5

filter has n recursive calls, but you also do a copy operation on each iteration which takes n, so you end up having Θ(n^2). If you implemented it 'properly' it should be Θ(n) though.

Same for my_reverse.

Same for revfilter_beta.

revfilter_alpha just does a filter and then a reverse, so thats Θ(n^2 + n^2) = Θ(n^2).


EDIT: Let's look into filter a bit more.

What you want to figure out is how many operations are performed relative to the size of the input. O(n) means that at the very worst case, you will do on the order of n operations. I say "on the order of" because you could, for example, do O(n/2) operations, or O(4n), but the most important factor is n. That is, as n grows, the constant factor becomes less and less important, so we only look at the non-constant factor (n in this case).

So, how many operations does filter perform on a list of size n?

Let's take it from the bottom up. What if n is 0 - an empty list? Then it will just return an empty list. So let's say that's 1 operation.

What if n is 1? It will check whether lst[0] should be included - that check takes however long it takes to call f - and then it will copy the rest of the list, and do a recursive call on that copy, which in this case is an empty list. so filter(1) takes f + copy(0) + filter(0) operations, where copy(n) is how long it takes to copy a list, and f is how long it takes to check whether an element should be included, assuming it takes the same amount of time for each element.

What about filter(2)? It will do 1 check, then copy the rest of list and call filter on the remainder: f + copy(1) + filter(1).

You can see the pattern already. filter(n) takes 1 + copy(n-1) + filter(n-1).

Now, copy(n) is just n - it takes n operations to slice the list in that way. So we can simplify further: filter(n) = f + n-1 + filter(n-1).

Now you can try just expanding out filter(n-1) a few times to see what happens:

filter(n) = f + n-1 + filter(n-1)
          = 1 + n-1 + (f + n-2 + filter(n-2))
          = f + n-1 + f + n-2 + filter(n-2)
          = 2f + 2n-3 + filter(n-2)
          = 2f + 2n-3 + (f + n-3 + filter(n-3))
          = 3f + 3n-6 + filter(n-3)
          = 3f + 3n-6 + (f + n-4 + filter(n-4))
          = 4f + 4n-10 + filter(n-4)
          = 5f + 5n-15 + filter(n-5)
          ...

Can we generalize for x repetitions? That 1, 3, 6, 10, 15... sequence is the triangle numbers - that is, 1, 1+2, 1+2+3, 1+2+3+4, etc. The sum of all numbers from 1 to x is x*(x-1)/2.

          = x*f + x*n - x*(x-1)/2 + filter(n-x)

Now, what is x? How many repetitions will we have? Well, you can see that when x = n, you have no more recursion - filter(n-n)=filter(0)=1. So our formula is now:

filter(n) = n*f + n*n - n*(n-1)/2 + 1

Which we can simplify further:

filter(n) = n*f + n^2 - (n^2 - n)/2 + 1
          = n*f + n^2 - n^2/2 + n/2 + 1
          = n^2 - n^2/2 + f*n + n/2 + 1
          = (1/2)n^2 + (f + 1/2)n + 1

So there ya have it - a rather detailed analysis. That would be Θ((1/2)n^2 + (f + 1/2)n + 1)... assuming f is insignificant (say f=1) that gets to Θ((1/2)n^2 + (3/2)n + 1).

Now you'll notice, if copy(n) took a constant amount of time instead of a linear amount of time (if copy(n) was 1 instead of n), then you wouldn't get that n^2 term in there.

I'll admit, when I said Θ(n^2) initially, I didn't do this all in my head. Rather, I figured: ok, you have n recursive steps, and each step will take n amount of time because of the copy. n*n = n^2, thus Θ(n^2). To do that a bit more exactly, n shrinks at each step, so you really have n + (n-1) + (n-2) + (n-3) + ... + 1, which ends up being that same figure as above: n*n - (1 + 2 + 3 + ... + n) = n*n - n*(n-1)/2 = (1/2)n^2 + (1/2)n, which is the same if I had used 0 instead of f, above. Likewise, if you had n steps but each step took 1 instead of n (if you didn't have to copy the list), then you'd have 1 + 1 + 1 + ... + 1, n times, or simply n.

But, that requires a bit more intuition so I figured I'd also show you the brute force method that you can apply to anything.

Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • Wait, so they are all Θ(n^2)? Could you explain to me your reasoning in a bit more detail and how to approach these kinds of problems? I still feel very lost. – atkayla Oct 11 '12 at 01:54
  • @TidusSmith: There ya go... does that make any sense? heh – Claudiu Oct 11 '12 at 03:24
  • It does! Thanks! My professor explained it in about 2 sentences, so I think I was just confused because it looks like there's a lot more to it haha. – atkayla Oct 11 '12 at 03:47
2

All of your functions are O(N^2) because they take O(N) time per recursive step and there will be N steps on a list of length N.

There are two expensive (that is, O(N)) operations that you're doing in your functions. The first is slicing (e.g. lst[1:]). The second is list concatenation (using the + operator).

Both of those may be more expensive than you are expecting, largely because Python's lists are not like list data types in other languages. Under the hood they're arrays, not linked lists. It is possible to perform the operations above on linked lists in O(1) time (though O(1) slicing is destructive). In Lisp, for instance, the algorithms you used would be O(N), rather than O(N^2).

Recursion is also often sub-optimal in Python because there is no tail call elimination. Python's default recursion limit is 1000 in recent versions so long lists will break purely recursive solutions unless you mess around in the sys module to increase the limit.

It is possible to do an O(N) version of those algorithms in Python too, but you'll want to avoid the expensive list operations above as much as possible. Rather than recursion, I suggest using generators, which are a much more "pythonic" style of programming.

Filtering using a generator is very easy to do. The built-in filter function does it already, but you can write your own in just a few lines:

def my_filter(f, iterable):
    for e in iterable:
        if f(e):
            yield e

Reversing the order of things is a bit more complicated, since you need to either be able to do random access on the source or use O(N) extra space (your algorithm uses the stack for that space, even though lists follow the sequence protocol and can be randomly accessed). The built-in reversed function only works on sequences, but here's a version that works on any iterable (such as another generator):

def my_reversed(iterable):
    storage = list(iterable)  # consumes all the input!
    for i in range(len(storage)-1, -1, -1):
        yield storage[i]

Note that unlike many generators, this on consumes all of its input at once before it starts to yield output. Don't run it on an infinite input!

You can compose these in either order, and my_reversed(filter(f, lst)) should be equivalent to filter(f, my_reversed(lst)) (though for the latter, using the built-in reversed function is probably better).

Running time for both of the generators above (and their composition in either order) will be O(N).

Blckknght
  • 100,903
  • 11
  • 120
  • 169