Why is the Big-O of this algorithm N^2*log N

Question

Fill array a from a[0] to a[n-1]: generate random numbers until you get one that is not already in the previous indexes.

This is my implementation:

public static int[] first(int n) {
    int[] a = new int[n];
    int count = 0;

    while (count != n) {
        boolean isSame = false;
        int rand = r.nextInt(n) + 1;

        for (int i = 0; i < n; i++) {
            if(a[i] == rand) isSame = true;
        }

        if (isSame == false){
            a[count] = rand;
            count++;
        }
    }

    return a;
}

I thought it was N^2 but it's apparently N^2logN and I'm not sure when the log function is considered.

Looks to me it doesn't have an upper bound in time complexity. You can keep failing to generate a unique number. Can you give a reference where it says this is *O(n^2 log n)*? — Willem Van Onsem, Feb 18 '15 at 20:33
Can't say I have much interest in analyzing the big-O of a hideous algorithm for which a trivial O(n) replacement exists. — Lee Daniel Crocker, Feb 18 '15 at 20:41
If you are looking for a more efficient algorithm, it can be done in *O(n)* (worst case). — Willem Van Onsem, Feb 18 '15 at 20:44
This is a textbook probabilistic algorithm, you wouldn't actually implement it — dfb, Feb 18 '15 at 20:53
@rpattiso what if the range over which you are selecting random numbers greatly exceeds the size of a list you are willing to create? — Random832, Feb 18 '15 at 21:21
@Random832 well, It seems the problem statement is more general than the code: `r.nextInt(n) + 1` is between 1 and n and n is the size of the array so shuffle is a good replacement in the code. good catch :) — ryanpattison, Feb 18 '15 at 21:32
This code is not guaranteed to run in any given running time because `r.nextInt(n)` may never give you the value you want. It's unlikely, but if you want to analyze *worst case* scenario, you can't. Worst case, it gets X it's first iteration and continues to get X's for the rest of eternity. — corsiKa, Feb 18 '15 at 23:20
This is a randomized algorithm, in fact a [Las Vegas algorithm](http://en.wikipedia.org/wiki/Las_Vegas_algorithm). Analysis of such algorithms is not commonly thought at undergraduate level. http://cs.stackexchange.com is probably a better place to ask such questions. — Fizz, Feb 19 '15 at 03:49
[cs.SE] would expect more of an own attempt, maybe following [our reference question](http://cs.stackexchange.com/questions/23593/is-there-a-system-behind-the-magic-of-algorithm-analysis) on algorithm analysis. — Raphael, Feb 19 '15 at 08:12

JuniorCompressor · Accepted Answer · 2015-02-18T21:15:26.800

The 0 entry is filled immediately. The 1 entry has probability 1 - 1 / n = (n - 1) / n of getting filled by a random number. So we need on average n / (n - 1) random numbers to fill the second position. In general, for the k entry we need on average n / (n - k) random numbers and for each number we need k comparisons to check if it's unique.

So we need

n * 1 / (n - 1) + n * 2 / (n - 2) + ... + n * (n - 1) / 1

comparisons on average. If we consider the right half of the sum, we see that this half is greater than

n * (n / 2) * (1 / (n / 2) + 1 / (n / 2 - 1) + ... + 1 / 1)

The sum of the fractions is known to be Θ(log(n)) because it's an harmonic series. So the whole sum is Ω(n^2*log(n)). In a similar way, we can show the sum to be O(n^2*log(n)). This means on average we need

Θ(n^2*log(n))

operations.

Now, this is a proper explanation. – JP Illanes Feb 19 '15 at 01:00 — JP Illanes, Feb 19 '15 at 01:00

score 15 · Answer 2 · answered Feb 18 '15 at 20:39

15

This is similar to the Coupon Collector problem. You pick from n items until you get one you don't already have. On average, you have O(n log n) attempts (see the link, the analysis is not trivial). and in the worst case, you examine n elements on each of those attempts. This leads to an average complexity of O(N^2 log N)

answered Feb 18 '15 at 20:39

dfb

13,133
2
31
52

Why? This analysis doesn't have anything to do with the inner loop. The outer loop runs on average O(n lg n) times, The extra n factor is from the inner loop, whether it breaks or not – dfb Feb 18 '15 at 20:44
@JohnPirie - n^2 is a lower bound, but it's not very tight. The outer loop is counting the number of attempts, there is no guarantee that the number of attempts is only n - it will be at least n and on average O(n lg n) – dfb Feb 18 '15 at 21:01
2

"average" "O(n lg n)" uhhh, Big O notation is for the *worst case* not the *average case*. =\ – corsiKa Feb 18 '15 at 23:17
8

@corsiKa - Big-O is an asymptotic bound on a function. It's perfectly valid to say that the expected runtime of a function is upper bounded, just like you might bound the worst case runtime. – dfb Feb 18 '15 at 23:35
3

There is nothing in the definition of Big-O notation that says it's about the worst case. [Tilde notation](http://introcs.cs.princeton.edu/java/41analysis/), which is not widely used, is actually *more* restrictive; 2x is O(x), but it is not the case that 2x ~ x. – user2357112 Feb 19 '15 at 00:49
1

@JeroenVannevel [You are wrong](http://cs.stackexchange.com/questions/23068/how-do-o-and-relate-to-worst-and-best-case). – Raphael Feb 19 '15 at 08:10
It is not very proper to use a worst-case estimate (for the inner loop) to find the average complexity. But of course we know that, for linear search, the two differ only by a constant factor 2, so that makes this answer correct anyway. – Marc van Leeuwen Feb 19 '15 at 10:06

score 2 · Answer 3 · answered Feb 19 '15 at 00:43

2

The algorithm you have is not O(n^2 lg n) because the algorithm you have may loop forever and not finish. Imagine on your first pass, you get some value $X$ and on every subsequent pass, trying to get the second value, you continue to get $X$ forever. We're talking worst case here, after all. That would loop forever. So since your worst case is never finishing, you can't really analyze.

In case you're wondering, if you know that n is always both the size of the array and the upper bound of the values, you can simply do this:

int[] vals = new int[n];
for(int i = 0; i < n; i++) {
    vals[i] = i;
}
// fischer yates shuffle
for(int i = n-1; i > 0; i--) {
   int idx = rand.nextInt(i + 1);
   int t = vals[idx];
   vals[idx] = vals[i];
   vals[i] = t;
}

One loop down, one loop back. O(n). Simple.

answered Feb 19 '15 at 00:43

corsiKa

81,495
25
153
204

A random number generator that generates the same number every time is not a random number generator. – Jack Aidley Feb 19 '15 at 11:07
@JackAidley a random number generator that _cannot_ generate the same number every time is not a random number generator. (But a PRNG would not have this behaviour). – Davidmh Feb 19 '15 at 11:50
David is correct that modern implementations of it *would* not behave that way, but we can not guarantee it. After all, we're talking about worst possible case, not worst expected case. – corsiKa Feb 19 '15 at 15:18

score -1 · Answer 4 · answered Feb 18 '15 at 20:38

-1

If I'm not mistaken, the log N part comes from this part:

for(int i = 0; i < count; i++){
    if(a[i] == rand) isSame = true;
}

Notice that I changed n for count because you know that you have only count elements in your array on each loop.

answered Feb 18 '15 at 20:38

Manuel Ramírez

2,275
1
13
15

you're gonna need to explain why. – ryanpattison Feb 18 '15 at 20:38
This would mean the original algorithm is actually N³ – RealSkeptic Feb 18 '15 at 20:39
1

@rpattiso : dfb pretty much explained it in his answer below. – Manuel Ramírez Feb 18 '15 at 20:44
@ManuelRamírez, indeed, no reason to revise now, but you posted and i commented before that. – ryanpattison Feb 18 '15 at 20:46

Why is the Big-O of this algorithm N^2*log N

4 Answers4

Linked