Find two elements with smallest absolute difference in an interval

Question

I'm given an array and a list of queries of type L R which mean find the smallest absolute difference between any two array elements such that their indices are between L and R inclusive (Here the starting index of array is at 1 instead of at 0)

For example take the array a with elements 2 1 8 5 11 then the query 1-3 which would be (2 1 8) the answer would be 1=2-1, or the query 2-4 (1 8 5) where the answer would be 3=8-5

Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.

The problem is that I'll have a lot of intervals to check I have to keep the original array intact.

What I've done is I constructed a new array b with indices from the first one such that a[b[i]] <= a[b[j]] for i <= j. Now for each query I loop through the whole array and look if b[j] is between L and R if it is compare its absolute difference to the first next element that is also between L and R keep the minimum and then do the same for that element until you get to the end.

This is inefficient because for each query I have to check all elements of the array especially if the query is small compared to the size of array. I'm looking for a time efficient approach.

EDIT: The numbers don't have to be consecutive, perhaps I gave a bad array as an example, What I've meant for example if it's 1 5 2 then the smallest difference is 1=2-1. In a sorted array the smallest difference is guaranteed to be between two consecutive elements, that's why I've thought of sorting

@pkpnd At most 10^5 queries, while the size of array is about 10^5, 10^6. I'm looking for a solution which runs in few seconds. — kingW3, Apr 22 '18 at 14:32
Are the array elements bounded integers? Do you have the query list offline? — David Eisenstat, Apr 22 '18 at 14:52
@DavidEisenstat They fit into the 32bit int size though otherwise no, the queries are inputted through the standard input and may be different. — kingW3, Apr 22 '18 at 15:04
I mean, can you read the entire query list before printing results? — David Eisenstat, Apr 22 '18 at 15:05
I'm confused. If n were 100 and our series of intervals was [1, 100], [2, 50], [3, 100], [4, 50], ... [49, 50], could someone please explain to me how David Eisenstat's method would reduce the number of operations from O(n^2) to O(n √n)? — גלעד ברקן, Apr 24 '18 at 14:31
@גלעדברקן I'll give it a try, the intervals you posted are sorted by L but not by the wacky sort, for example here [2,50] comes before [1,100] because floor(l/sqrt(100))=0 in both cases but 50>100 so the given queries sorted could possibly be(there are multiple combinations) [4,50],[8,50],[6,50],[2,50],[1,100],[9,100],[5,100],[7,100],[3,100],[17,50] etc. Anyway I recommend reading [this](https://www.geeksforgeeks.org/mos-algorithm-query-square-root-decomposition-set-1-introduction/) especially the Time Complexity Analysis: chapter below for a more thorough explanation (it helped me). — kingW3, Apr 24 '18 at 14:52
Thanks for trying :) I assumed "sort them in order of lexicographically nondecreasing `(floor(l / sqrt(n)), r)`" means they are sorted by x in (x, y), but that's not what your example shows. — גלעד ברקן, Apr 24 '18 at 15:35
Ah, I see the link in your comment mentions each block of L is sorted by R. So for our first block, we'd have something like, `[2, 50], [4, 50],... [10, 50], [1, 100], [3, 100]...[9, 100]`. That's O(sqrt n * sqrt n + n) for this one block, no? (sqrt n items, each with an update of at most sqrt n items in the tree on the left side of the interval, with at most a change on the order of n for the total change in the right side of the interval.) — גלעד ברקן, Apr 24 '18 at 15:45
Ah so O(n) for one block means O(n sqrt n) for all of them. Nice :) — גלעד ברקן, Apr 24 '18 at 15:52

score 6 · Accepted Answer · answered Apr 22 '18 at 16:45

I'll sketch an O(n (√n) log n)-time solution, which might be fast enough? When I gave up sport programming, computers were a lot slower.

The high-level idea is to apply Mo's trick to a data structure with the following operations.

insert(x) - inserts x into the underlying multiset
delete(x) - deletes one copy of x from the underlying multiset
min-abs-diff() - returns the minimum absolute difference
                 between two elements of the multiset
                 (0 if some element has multiplicity >1)

Read in all of the query intervals [l, r], sort them in order of lexicographically nondecreasing (floor(l / sqrt(n)), r) where n is the length of the input, and then to process an interval I, insert the elements in I - I' where I' was the previous interval, delete the elements in I' - I, and report the minimum absolute difference. (The point of the funny sort order is to reduce the number of operations from O(n^2) to O(n √n) assuming n queries.)

There are a couple ways to implement the data structure to have O(log n)-time operations. I'm going to use a binary search tree for clarity of exposition, but you could also sort the array and use a segment tree (less work if you don't have a BST implementation that lets you specify decorations).

Add three fields to each BST node: min (minimum value in the subtree rooted at this node), max (maximum value in the subtree rooted at this node), min-abs-diff (minimum absolute difference between values in the subtree rooted at this node). These fields can be computed bottom-up like so.

if node v has left child u and right child w:
    v.min = u.min
    v.max = w.max
    v.min-abs-diff = min(u.min-abs-diff, v.value - u.max,
                         w.min - v.value, w.min-abs-diff)

if node v has left child u and no right child:
    v.min = u.min
    v.max = v.value
    v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)

if node v has no left child and right child w:
    v.min = v.value
    v.max = w.max
    v.min-abs-diff = min(w.min - v.value, w.min-abs-diff)

if node v has no left child and no right child:
    v.min = v.value
    v.max = v.value
    v.min-abs-diff = ∞

This logic can be implemented pretty compactly.

if v has a left child u:
    v.min = u.min
    v.min-abs-diff = min(u.min-abs-diff, v.value - u.max)
else:
    v.min = v.value
    v.min-abs-diff = ∞
if v has a right child w:
    v.max = w.max
    v.min-abs-diff = min(v.min-abs-diff, w.min - v.value, w.min-abs-diff)
else:
    v.max = v.value

insert and delete work as usual, except that the decorations need to be updated along the traversal path. The total time is still O(log n) for reasonable container choices.

min-abs-diff is implemented by returning root.min-abs-diff where root is the root of the tree.

I didn't know there existed a solution with an N log N time complexity! — Adi219, Apr 22 '18 at 19:35

Sean · Answer 2 · 2018-04-22T14:55:53.923

EDIT #2: My answer determines the smallest difference between any two adjacent values in a sequence, not the smallest difference between any two values in the sequence.

When you say that you have a lot of intervals to check, do you happen to mean that you have to perform checks of many intervals over the same sequence of numbers? If so, what if you just pre-computed the differences from one number to the next? E.g., in Python:

elements = [2, 1, 8, 5, 11]

def get_differences(sequence):
    """Yield absolute differences between each pair of items in the sequence"""
    it = iter(sequence)
    sentinel = object()
    previous = next(it, sentinel)
    if previous is sentinel:
        return ()
    for current in it:
        yield abs(previous - current)
        previous = current

differences = list(get_differences(elements)) # differences = [1, 7, 3, 6]

Then when you have to find the minimum difference, just return min(differences[start_index:stop_index-1].

EDIT: I missed your paragraph:

Now this is easy if you have to look at one interval you sort the interval and then compare i-th element with i+1-th and store the minimum difference for each i.

But I still think what I'm saying makes sense; you don't have to sort the entire collection but you still need to do an O(n) operation. If you're dealing with numeric values on a platform where the numbers can be represented as machine integers or floats, then as long as you use an array-like container, this should be cache friendly and relatively efficient. If you happen to have repeated queries, you might be able to do some memoization to cache pre-computed results.

I can't be sure but I don't think OP can afford O(n^2) precomputation. This seems more like an O(n log n) precomutation, O(n log n) query kind of problem, although I can't yet figure out how... — k_ssb, Apr 22 '18 at 14:23
There's a much quicker list comprehension for getting the differences — Adi219, Apr 22 '18 at 14:35
This doesn't find all pairwise differences, only consecutive differences — k_ssb, Apr 22 '18 at 14:45

Find two elements with smallest absolute difference in an interval

2 Answers2

Linked