Closest pair of points (CLRS pg 1043): Running time of splitting a sorted array into two sorted arrays

Question

In finding the closest pair of points in O(nlgn) time, the pseudocode for splitting a sorted list into two sorted lists (CLRS 3rd ed pg 1043) is said to run in O(n) time.

algorithm from CLRS pg 1043

However, this assumes that line 4 runs in constant time, which I find hard to believe (I'd assume it runs in O(lgn) time if it were stored as a binary tree, giving a total running time of O(nlgn).

Y is a sorted array, YL and YR are the two new sub-arrays. PL is a subset of Y in random order, and YL is the same subset, but in sorted order.

Where am I going wrong with my reasoning?

When adding an element of Y to PL, mark it as belonging to PL. (Just a guess, I don't know how PL is formed). — n. m. could be an AI, Dec 26 '16 at 07:13
If PL is made like reasonably big hashmap/hashset, expected average time of lookup can be O(1), but worst case is another story... — Alexander Anikin, Dec 26 '16 at 09:56
@AlexanderAnikin We are actually dealing with worst case for the big O notation. — max_max_mir, Dec 30 '16 at 04:26
@n.m. PL (for the sake of this algorithm) can be assumed to be a subset of Y in a random order. — max_max_mir, Dec 30 '16 at 04:27
The question is not what it is, but what kind of process is used to form it. Is it formed by picking elements of Y in some order? — n. m. could be an AI, Dec 30 '16 at 07:47
@n.m. PL is formed by first sorting the points using the x coordinate, and then taking the left half. For e.g. if set of points P (sorted by x coord) is [(1,2), 2,5), (3,4), (4,2), (5,1)] then PL is [(1,2), (2,5)] and Y is the set of points P sorted by the y coord. — max_max_mir, Dec 30 '16 at 22:21

score 1 · Answer 1 · answered Dec 26 '16 at 06:19

For simplicity sake we're assuming the list is of integers and not strings or integers which can complicate things greatly here.

There are two calculations to consider here:

for loop: This runs for length of Y times, which I'm assuming is N here
the tricky part - comparison of Y[i] with PL(Note: the comparison of two numbers is constant if we consider them to be of word size). Now, accessing Y[i] is constant since we're dealing with Random Access Machines. However, to compare it with an array PL of length, say, k will take k time. If this k is very small and independent of the size of input array Y, this ideally would be constant.

To write it with greater precision would mean you consider the time taken for k comparisons (length of PL) and hence, the total time of this pseudo code would be O(Nk). But, if the assumptions that k is random and independent of N hold true, it really is O(N)

PL is dependent on N - you can assume k = N/2, but thats – max_max_mir Dec 30 '16 at 04:25 — max_max_mir, Dec 30 '16 at 04:25

md5 · Answer 2 · 2016-12-31T15:40:26.463

0

I don't know how it is supposed to work in the book, but thinking about the way the algorithm looks like, you can come up with the following idea:

Y[i], X[i], YL[i], XL[i], YR[i] and XR[i] are integers, corresponding to the index of the ith-point (so you just have to store some global array which, given the index, returns the x or y coordinate).
PL[i] is a boolean, true if the i-th point is in the left part, false otherwise.

At each recursion step, you can compute PL[i] using y coordinates (O(n) time). Then you separate the set of points in two sets "left" and "right" using the algorithm from the book, replacing the line if Y[i] in PL by if PL[Y[i]] (such access is O(1), so in overall we get O(n)).

This has O(n) time complexity and uses O(n) memory.

Thus the closest pair problem is solved that way in T(n) = O(n log n).

edited Dec 31 '16 at 15:40

answered Dec 26 '16 at 13:22

md5

23,373
3
44
93

I agree that it has O(n log n) time, but the book claims it has O(n) time. – max_max_mir Dec 30 '16 at 04:30
@max_max_mir: I meant `O(n log n)` in overall for the whole closest pair problem (this particular step in indeed `O(n)`). – md5 Dec 30 '16 at 11:12
can you provide an example of this? Let's say Y = [(3,1), (2,2), (4,3), (1,4), (6,5)] and is sorted by y coord. PL is [(1,4), (3,1)] and is sorted by x coord. If I look up the list PL for any point in Y, it takes me O(n) time. However, I think you're saying there's a way to do a reverse lookup of sorts to determine if a point is in PL or not, and that takes O(1) time - but I don't quite see how. – max_max_mir Dec 30 '16 at 22:18
@max_max_mir: I changed a bit the definition of `Y` and `PL`. Let's say we have some global arrays `xof = [2, 3, 4, 1, 6]` and `yof = [2, 1, 3, 4, 5]` (the points in any order). Then `Y = [2, 1, 3, 4, 5]` is the array of **indices** (here 1-indexed) of the sorted array. `PL = [1, 0, 0, 1, 1]` means for example that `(2, 2)`, `(1, 4)` and `(6, 5)` belong to the left part. – md5 Dec 31 '16 at 15:44

Closest pair of points (CLRS pg 1043): Running time of splitting a sorted array into two sorted arrays

2 Answers2