Finding the largest array index with negative value

Question

Given an array of positive elements (1 based indexing), you have to process two types of queries:

(V) find the sum of numbers in the range 1:V (both inclusive)
(V, X) subtract the number X to from all in the range 1:V and report the largest index i in range 1:V such that the value at that index is negative, where the answer for this query is 0 if no such index exists.

I can do the first query using fenwick tree or segment tree but how do i support second query? I have already tried an O(n) time per query approach just checking each element in range 1...V but it times out. I need to process 10^5 such queries over an array of size 10^5.

Since you mention a Fenwick tree, your question seems incomplete, can your array be updated or is it static? — Richard, Jun 08 '17 at 18:33
Also: do you have a target time complexity? Otherwise, the simplest answer is to copy the array subset, subtract `X`, and search, all in _O(n)_ time. — Richard, Jun 08 '17 at 18:36
@Richard The array needs to be updated. I meant that if there had been queries of type 1 only I could have used fenwick trees. I need to process 10^5 queries over an array of length 10^5 in under 3 seconds. — csdteb, Jun 08 '17 at 18:37
And, to clarify, your data is not guaranteed to be in sorted order? — Richard, Jun 08 '17 at 18:41
Are the elements of the array guaranteed to be integers? Do they have an lower and/or upper bound? — Richard, Jun 08 '17 at 18:52
@Richard Every element in the array is initially positive having maximum value 10^5, the number X can be up to 10^7. I tried solving it in C++. — csdteb, Jun 08 '17 at 19:02

Richard · Answer 1 · 2017-06-08T19:44:23.480

The most straight-forward approach is to find the element in O(n) time by simply searching through the array, like so:

arr = [0,5,2,3,10]
largest_i = -1
X = 7
for i in range(len(arr)):
  if arr[i]-X<0:
    largest_i = i
largest_i+1 #Your answer (shifting from a 0-based to a 1-based index)

Given the small size of the array and the time constraints, this should work in any compiled and most interpreted languages.

EDIT

I stand corrected (and it should have been obvious that the worst-case of 10^10 was pretty bad). Since you say this times out, here's a more a sophisticated approach.

Create an AVL tree, or another self-balancing binary tree which supports insertion and removal.
Create key items with attributes value and index. value is the value of the item whereas index is the item's position in the flat array.
Create a hashmap that links to tree nodes based on their index.
Add items to the tree so that it balances by value, but remove items based on index.
When your flat array is updated, update the tree accordingly.
To answer Query 2, do an in-order traversal of the tree until value-X>=0. Return the index of the last nodes for which this was false.

Insertion and deletion of the tree are both in O(log n) whereas the in-order traversal has a worst-case of O(n), but is guaranteed to check only elements which may be negative.

A possible implementation for this is as follows:

#include <map>
#include <vector>
#include <unordered_map>
#include <algorithm>
#include <cstdlib>
#include <cassert>

class MagicArray {
 private:
  typedef std::multimap<int, int> arr_idx_t;
  std::vector<int> arr; //Array of possibly negative integers
  arr_idx_t arr_sorted; //Self-balancing tree of (Integer, Index) pairs
  std::unordered_map<int, arr_idx_t::iterator> arr_idx; //Hash table linking Index to a (Integer, Index)
  void indexElement(const int idx){
    auto ret = arr_sorted.emplace(arr.at(idx),idx);
    arr_idx[idx] = ret;
  }
 public:
  void insert(const int i){
    arr.emplace_back(i);
    const auto idx = arr.size()-1; //Index of most recently inserted element
    indexElement(idx);
  }
  void alter(const int idx, const int newval){
    arr.at(idx) = newval;
    arr_sorted.erase(arr_idx[idx]); //Remove old value from tree
    indexElement(idx);
  }
  int findMatch(const int X){
    //The next two lines reduce run-time from 3s to 0.031s
    if(arr_sorted.rbegin()->first-X<0) //Even largest element is less than zero
      return arr.size();

    int foundi = -1;
    for(const auto &kv: arr_sorted){
      if(kv.first-X<0){
        foundi = std::max(foundi,kv.second);
      } else {
        break;
      }
    }
    return foundi+1;
  }
};

int main(){
  assert(RAND_MAX>10000000); //Otherwise code below will not work

  MagicArray ma;
  for(unsigned int i=0;i<10000;i++)
    ma.insert(rand()%10000);

  for(unsigned int i=0;i<10000;i++){
    ma.alter(rand()%10000,rand()%10000);
    ma.findMatch(rand()%1000000);
  }
}

If you leave out the first two code lines of findMatch this takes 3s on my Intel(R) Core(TM) i5 CPU M480@2.67GHz and 2.359s on an Intel(R) Xeon(R) CPU E5-2680v3@2.50GHz on a supercomputer.

If you include the first two code lines of findMatch then the code takes <0.035s on both machines.

This is why it's very important to consider the ranges of the values. You said that the array includes values in the range [0,10⁵] whereas X is in the range [0,10⁷], this means that 99% of the values X takes will be larger than any value in the array and the answer will therefore simply be the size of the array.

So the trick is to use an inexpensive check to see if we know the answer simply and, if not, to then perform the more expensive search.

I have already tried it and it exceeds the time limit even for a sub task having array length 10^5 and 10^4 queries. — csdteb, Jun 08 '17 at 18:51
@csdteb: You should have stated in your question that you had tried this, so as to avoid wasting people's time. You should edit your question now to clearly state this. — Richard, Jun 08 '17 at 18:51
@csdteb: I've edited my answer with an alternative approach. — Richard, Jun 08 '17 at 19:10

four_lines · Accepted Answer · 2017-06-09T00:10:03.597

Use a segment tree, such that every node stores the minimum value in its range and also the sum of the elements in its range. First query can be done directly in logn time complexity. For the second query, first subtract the given value from every element in the range (logn again) and then query for the rightmost value less than 0 (logn too).

EDIT: A better explanation

So first build a segment tree such that the leaves store the original value in the array. Every other node is built with two values: totalsum and minval. Build that easily with this equation:

segment_tree[id].minval = min(segment_tree[id*2].minval, segment_tree[id*2+1].minval)
segment_tree[id].totalsum = segment_tree[id*2].totalsum + segment_tree[id*2+1].totalsum

The build takes O(n).

Query A: Finding the sum in some range is easy, just find the topmost ranges relevant to your query range and add them up. Time O(logn) per query.

Query B: Separate this query into two operations:

A) Subtracting X from a range: Let's say you subtract X from some range [a,b]. So the total sum of [a,b] becomes old_totalsum[a,b] - (b+1-a)*X and the new minimum value becomes old_minval[a,b] - X. The key is you again do it only on the topmost ranges of your segment tree that are under the query range, so that the operation takes only logn complexity. There's slightly more to this technique, you should read it online if you aren't familiar with it already (it's called Lazy Propagation).

B) Check the rightmost index with value < 0: Start your query on the root of the segment tree. Is the minimum value of the rightchild < 0? Then go there. Else is the minval of the leftchild < 0? Then go to the left child. If children have minval > 0, return -1. Once you reach a leaf, just return the index the leaf corresponds to. So you traverse once along the height of the tree, again O(logn).

So total complexity of the program would be O(n + Q.logn), where Q is the number of queries.

A segment tree is a static data structure. In one of the comments on the question OP states that the array needs to be updated. — Richard, Jun 08 '17 at 19:46
Updated: It seems that there are dynamic segment trees (see [here](https://pdfs.semanticscholar.org/fde4/9b0068173360df3a7955084f0810c2b5d500.pdf)) and the wiki page should probably be updated (*sigh*). — Richard, Jun 08 '17 at 19:49

Finding the largest array index with negative value

2 Answers2