5

So here is a problem, I am given an integer array, whose number is all distinct, let's say it is

int[] data = {21, 34, 12, 88, 54, 73};

now that I would like to see if a subarray, or a range, contains a number is in a range(which is also given). In other words, I want to see if a range of the array contains a number that is in a range. For instance, if I have a function check(int a, int b, int l, int r) where a and b is the range of the array and l and r is the range of the number.

So for the array above, check(0, 2, 20, 50) should return true since from index = 0 to 2, there is 21, 34, 12 and there is two numbers,21, 34, is in range of 20 to 50.

So another example would be check(2, 3, 20, 80) should return false since there,12, 88, is no number in range of 20, 80.

I'm thinking about using Segment Tree, since as I know, RMQ(range minimum query) can be solved by using Segment Tree, thus I think Segment Tree would also work on this problem; however, all of the "get" function of Segment Tree is "single"(Perhaps not the best word), so, I would want to know what nodes should the Segment Tree hold. Is there any algorithm that can answer each query in O(log(n)) while the "build" time is not O(n^2), where n is the size of the array?

Note: Using Segment Tree is just my own thought, any other approach is appreciated.

lier wu
  • 620
  • 7
  • 23
  • If you ask for anther way but not segement tree, I can show how to find the ans. – Wing Kui Tsoi Dec 18 '21 at 03:33
  • What is the range of `array[i]`? – Abhinav Mathur Dec 18 '21 at 04:17
  • Segment tree could wrong _if_ you are thinking in terms of min and max. For example `check(0, 2, 20, 22)` should return true. – nice_dev Dec 18 '21 at 04:44
  • @WingKuiTsoi, Sure using segment tree is just my thought, not needed. – lier wu Dec 18 '21 at 15:59
  • @AbhinavMathur less than 2E5, all numbers are integer and distinct. – lier wu Dec 18 '21 at 16:01
  • @nice_dev I'm not thinking in terms of min and max but Segment Tree. A Segment Tree is not limited to getting the min and max. I'm thinking this problem is related to the Segment Tree since RMQ is similar to my question(In some way). – lier wu Dec 18 '21 at 16:04
  • @lierwu But what would you store in segments? – nice_dev Dec 18 '21 at 16:24
  • @nice_dev Two integers, ```check(int a, int b, int l, int r)```, I would pass in two integers, ```a``` being the left endpoint, ```b``` being the right endpoint. – lier wu Dec 18 '21 at 16:27
  • So for `[21,34,12]`, the `a` and `b` would be 12 and 34? – nice_dev Dec 18 '21 at 16:30
  • @nice_dev, wait, does your ```segments``` means the ```segments``` in the segment tree? I'm not sure if I understand correctly. – lier wu Dec 18 '21 at 16:31
  • @nice_dev, no, for ```[21,34,12]```, ```a``` and ```b``` would be ```0``` and ```2``` since ```[21, 34, 12, 88, 54, 73]```'s ```index from 0 to 2``` is ```[21,34,12]``` – lier wu Dec 18 '21 at 16:33
  • Ok I mean what would the nodes in the seg tree hold? If you are storing indexes, how will you decide if a value exists in the range l and r given to you during querying? – nice_dev Dec 18 '21 at 16:39
  • 1
    @nice_dev, I understand now, that is a problem, actually, that is the problem I have if I want to use a segment tree for this. That is also the problem I'm asking for. I'll update the question to be more clear, thank you. – lier wu Dec 18 '21 at 16:42
  • I can't see the relationship between O(log(n)) and your problem that searching numbers in the array within range so I don't know what to suggest unless you can specify. – Wing Kui Tsoi Dec 20 '21 at 15:24
  • @WingKuiTsoi specify what? I would want to answer each query in ```O(log(n))``` where there is a lot of queries, if preset needed, complete it in time complexity less than ```O(n^2)``` – lier wu Dec 20 '21 at 16:05
  • 1
    Can you process all the queries at the same time? It makes this problem a lot easier. – Matt Timmermans Dec 22 '21 at 14:00
  • @MattTimmermans ```int l``` and ``` int r``` can be any integer, thus, there is no way to process the queries all at once. – lier wu Dec 22 '21 at 16:36
  • I mean if you have access to all the queries in advance, then you can simplify the problem. – Matt Timmermans Dec 22 '21 at 16:36
  • @MattTimmermans May you explain a little more? I can't understand, sorry. – lier wu Dec 22 '21 at 16:38
  • Could you, for example, sort the queries and process them in order of their `l` values? – Matt Timmermans Dec 22 '21 at 16:41
  • @MattTimmermans Yes, you can... But I don't see how that can simplify the problem. – lier wu Dec 22 '21 at 16:43
  • 1
    ...and that's why you ask questions on SO :) I added a comment to the accepted answer. – Matt Timmermans Dec 22 '21 at 17:03

3 Answers3

4

It's a bit exotic, but a persistent red-black tree, or a persistent variant of any other self-balancing tree, would work.

A persistent data structure allows one to (time- and space-)efficiently take "snapshots" of the structure at different times, and then query those snapshots later, receiving results based on the structure's state as of the snapshot time. For this use case, the particular query we would want to do would be to count all the contained elements within a given range (which can be performed in O(log n) if each node is annotated with the number of its descendants).

In this case, you would start with an empty structure, and at time i, insert data[i] and then store a snapshot as snapshot[i]. Then, check(a,b,l,r) would be implemented as return snapshot[b].countInRange(l,r) > snapshot[a].countInRange(l,r). That is, if there were more elements in the target range as of time b than there were as of time a, then some element in the target range must have been added between a and b and thus satisfies your constraints.

If optimally implemented, the precomputation would take time O(n log n) and space O(n), and queries would take time O(log n).


If you were willing to relax the O(log n) requirement for queries, a simpler and potentially more practical approach would be a 2-dimensional k-D tree. Simply insert each data[i] as the point (i, data[i]), and then do a range search for a<=x<b, l<=y<r. This gives you a query time of O(sqrt(n)), which is not as efficient, but a lot easier to code up (or to find existing code for).

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • If persistent data structure is used (what I know is that it keeps its history/versions) and you keep making snapshots over time, then how come the space complexity is `O(n)` only. Please let me know if I misinterpret anything? – miiiii Dec 20 '21 at 18:52
  • 1
    @miiiii snapshots are stored as part of the structure itself (that’s what I mean about them being space efficient). Each one takes only O(1) amortised additional space. – Sneftel Dec 20 '21 at 18:58
  • Thanks for reply. But how `countInRange` can be implemented then? For each `a`,`b`,`l`,`r` combo it will precompute/store data? – miiiii Dec 20 '21 at 19:07
  • 1
    @miiiii It's implemented just as you would implement it for a regular red-black tree, by recursively visiting nodes that overlap the range. It's not precomputed for any particular set of inputs. – Sneftel Dec 20 '21 at 19:09
  • RB Trees are BSTs right? BST don't allow duplicates & is an ordered tree. So how it can be useful for the case here? Please correct me if I'm going off track. You've been good explainer so far :) I appreciate – miiiii Dec 20 '21 at 19:19
  • @miiiii I’m not sure what “duplicates” have to do with this. After the precomputation, you effectively have `n` separate trees, with tree `i` holding the first `i` elements. You do the query on the two trees you’re interested in. But since the trees are implemented as a persistent structure, they only take `O(n)` space in total. I’d suggest reading up on the basics of persistent data structures if you’re still confused about his this could work. – Sneftel Dec 20 '21 at 19:24
  • That's a really clever solution! – Alexey Veleshko Dec 20 '21 at 20:08
  • That's not only a clever solution, but a really elegant one! – ciamej Dec 21 '21 at 12:59
  • IIRC there is a persistent red-black tree that takes O(n) space, but it's quite complicated and won't work if you need to maintain counts. A persistent red-black order statistic tree will take O(n log n) space. Still a pretty good solution, though. – Matt Timmermans Dec 22 '21 at 13:43
  • @MattTimmermans Hm, you're right -- of course you end up modifying `O(log n)` nodes for each insertion. – Sneftel Dec 22 '21 at 14:49
  • 1
    Since the OP indicates in comments that you have all the queries in advance, you don't need to use a persistent tree. Make two lists of the queries sored by `l` and `r`. Then insert points in order into an ordinary order statistic tree. When you cross a query `l`, count the number of values in the range. Wen you cross a query `r` count the values in the range and subtract the previous count. If the answer is >0 then the query is satisfied. – Matt Timmermans Dec 22 '21 at 16:59
0

O(N) is easy:

public static boolean check(int[] data, int a, int b, int l, int r) {
    return Arrays.stream(data, a, b + 1).anyMatch(n -> n >= l && n <= r);
}

I suspect that any more big-O efficient approach would spend enough time building the needed data structure that it's not worth the effort unless you're doing a lot of lookups on a huge dataset. Even then, maybe a parallel version of the above might be good enough.

Shawn
  • 47,241
  • 3
  • 26
  • 60
  • Thanks, but I am going to need a lot of lookups... Thus, if a data structure is needed to answer queries(in ```log(n)```), I would want to build it in a time complexity less than ```O(n^2)```. – lier wu Dec 18 '21 at 16:06
-1

UPDATED:

public static void main(String[] args) {
    int[] data = {21, 34, 12, 88, 54, 73, 99, 100};
    List<Integer> dataList = Arrays.stream(data).boxed().collect(Collectors.toList());
    System.out.println(searchRange(0, 2, 20, 50, data));
    System.out.println(searchRange(2, 3, 20, 80, data));
    System.out.println(searchRange(0, 2, 20, 22, data));    

public static boolean searchRange(int from, int to, int min, int max, int[] data) {
    // slice array
    data = Arrays.copyOfRange(data, from, to + 1);
    Arrays.sort(data);
    // System.out.println(Arrays.toString(data));
    int index = findInBoundaries(data, min, max);
    // System.out.println(index);
    return index != -1;
}

// return -1: no elements found.
static int findInBoundaries(int[] data, int min, int max) {
    int start = 0;
    int end = data.length - 1;
    int ans = -1;
    while (start <= end) {
        int mid = (start + end) / 2;
        // Break if found 
        if (data[mid] >= min && data[mid] <= max) {
            ans = mid;
            break;
        } 
        // Right move if element <= max
        else if (data[mid] <= max) {
            start = mid + 1;
        }
        // Left move
        else {
            end = mid - 1;
        }
    }
    return ans;
}

Output

true
false
true

This code has been tested for more times. Unlike my first answer to hit the min and max boundaries independently, this is finding the range for the target element to determine if the subarray contains the eligible numbers.

Explanation:

To simplify the question, I define it as if any numbers of subarray is in the given range and the method should be in time complexity less than O(n^2).

Once the array is sorted, it is easy to do it in binary search. The solution starts from the middle element (int mid = (start + end) / 2) to search a number within the given range. When the element meets the range requirement, the loop terminates. If it is smaller than (or smaller than and equals to) the max value, it will search the right (larger) element, otherwise, it will search the left (smaller) element. In this case, the maximum loop times will be O(log n) where n is the size of the array.

Example:

I modified to compare the solution with normal looping by adding counters. In some cases, normal looping needs to loop through the whole array. The sorting for normal solution is not very important so I don't do it.

// return -1: no elements found.
static void findBoundaryCompareMethods(int[] data, int min, int max) {
    int start = 0;
    int end = data.length - 1;
    int ans = -1;
    int count = 0;
    while (start <= end) {
        int mid = (start + end) / 2;
        count++;
        // Right move to find element > max 
        if (data[mid] >= min && data[mid] <= max) {
            ans = mid;
            break;
        } 
        else if (data[mid] <= max) {
            start = mid + 1;
        }
        // Left move
        else {
            end = mid - 1;
        }
    }
    System.out.println("Method 1 Find: " + ans);
    System.out.println("Method 1 Count: " + count);
    ans = -1;
    count = 0;
    for (int i = 0; i < data.length; i++) {
        count++;
        if (data[i] >= min && data[i] <= max) {
            ans = i;
            break;
        }
    }
    System.out.println("Method 2 Find: " + ans);
    System.out.println("Method 2 Count: " + count);
}

The testing output is below. Method 1 is the answer solution and Method 2 is normal solution.

Output

Array: [12, 21, 34]
Min: 20 Max: 50
Method 1 Find: 1
Method 1 Count: 1
Method 2 Find: 1
Method 2 Count: 2

Array: [12, 88]
Min: 20 Max: 80
Method 1 Find: -1
Method 1 Count: 2
Method 2 Find: -1
Method 2 Count: 2

Array: [12, 21, 34]
Min: 20 Max: 22
Method 1 Find: 1
Method 1 Count: 1
Method 2 Find: 1
Method 2 Count: 2

Array: [12, 21, 34, 54, 73, 88, 99, 100]
Min: 70 Max: 73
Method 1 Find: 4
Method 1 Count: 3
Method 2 Find: 4
Method 2 Count: 5
Wing Kui Tsoi
  • 474
  • 1
  • 6
  • 16
  • 3
    `System.out.println(searchRange(0, 2, 20, 22, data));` returns `false` but expected is `true`. `min` and `max` decisions won't always help. – nice_dev Dec 20 '21 at 18:31
  • Please check my updated answer, thanks. – Wing Kui Tsoi Dec 21 '21 at 12:47
  • 2
    Your code does a sort for each query, leading to an `O(n log n)` query time. There's no reason to do that. It would be faster (`O(n)`) to just scan through linearly. – Sneftel Dec 21 '21 at 13:50
  • @WingKuiTsoi Could you add some explanation? Code only answers should be avoided. I could test further only after that. – nice_dev Dec 21 '21 at 18:26
  • @nice_dev The answer is updated. – Wing Kui Tsoi Dec 22 '21 at 13:26
  • @WingKuiTsoi I got your approach now, but it will be time consuming to sort for each subarray. Your worst case scenario time complexity will be `O(N * n * log(n))` where `N` is the no. of queries to answer and `n` is the size of the whole array. So, it is quadratic in nature if range given to the function is the entire array as a subarray. This way, a simple `O(n)` pass for each query would be more performant. – nice_dev Dec 22 '21 at 13:59
  • I don't understand why the worse case is O(N * n * log(n)) because I don't know what I misunderstand. If available, can you give example? @nice_dev – Wing Kui Tsoi Dec 22 '21 at 14:16
  • Include this one: Arrays.sort(data) ? – Wing Kui Tsoi Dec 22 '21 at 14:26
  • 2
    @WingKuiTsoi For each query, it takes `O(n log n)` time to run the sorting routine. So if there's `N` queries, the total complexity is `O(N * n log n)`. If you just linearly searched through the array for each query, it would only take `O(N * n)` time. – Sneftel Dec 22 '21 at 14:28
  • But I am doing binary search, it says binary search O(log n) time. did I get something wrong? @Sneftel – Wing Kui Tsoi Dec 22 '21 at 14:35
  • The sort takes `O(n log n)` time, and the binary search takes `O(log n)` time, so the total per-query time is `O(n log n + log n) = O(n log n)`. This is basic computer science stuff. – Sneftel Dec 22 '21 at 14:48