27

Given an array of Integers, and a range (low, high), find all contiguous subsequence in the array which have sum in the range.

Is there a solution better than O(n^2)?

I tried a lot but couldn't find a solution that does better than O(n^2). Please help me find a better solution or confirm that this is the best we can do.

This is what I have right now, I'm assuming the range to be defined as [lo, hi].

public static int numOfCombinations(final int[] data, final int lo, final int hi, int beg, int end) {
    int count = 0, sum = data[beg];

    while (beg < data.length && end < data.length) {
       if (sum > hi) {
          break;
       } else {
          if (lo <= sum && sum <= hi) {
            System.out.println("Range found: [" + beg + ", " + end + "]");
            ++count;
          }
          ++end;
          if (end < data.length) {
             sum += data[end];
          }
       }
    }
    return count;
}

public static int numOfCombinations(final int[] data, final int lo, final int hi) {
    int count = 0;

    for (int i = 0; i < data.length; ++i) {
        count += numOfCombinations(data, lo, hi, i, i);
    }

    return count;
}
Delgan
  • 18,571
  • 11
  • 90
  • 141
user1071840
  • 3,522
  • 9
  • 48
  • 74
  • Does `sum > hi .. break` assume that integers are non-negative? (Otherwise, why to break if the sum can decrease as we continue.) – AlexD Jul 03 '14 at 01:50
  • A [Segment Tree](http://en.wikipedia.org/wiki/Segment_tree) can be rather helpful for an output sensitive approach, though thinking it's still O(N^2) in the worst case where you return all continuous subsequences. – Nuclearman Jul 03 '14 at 02:07
  • 4
    Given an array of all zeroes and the range (-1, 1), there are O(n^2) solutions, and you clearly require O(n^2) time just to print the answers. – Raymond Chen Jul 03 '14 at 02:36
  • 2
    @RaymondChen I think in his code, he only return `count` ? – Pham Trung Jul 03 '14 at 02:42
  • 1
    Can all integers only be positive? or can be positive or negative? – notbad Jul 03 '14 at 03:08
  • @PhamTrung I saw this question posted on a site and it said 'find' all continuous sub-sequences. I'm doing both to test my result. – user1071840 Jul 03 '14 at 16:26
  • 1
    @notbad integers can be positive or negative – user1071840 Jul 03 '14 at 16:27
  • If item can be positive or negative so your code is wrong, as AlexD has mentioned. – Pham Trung Jul 03 '14 at 16:38
  • @Pham Trung: It's obviously possible to write code that doesn't run in O (n^2) if you give up the requirement that it solves the problem asked. I can reduce it to O (1) by not giving the correct count. – gnasher729 Jul 04 '14 at 08:54
  • @gnasher729 I don't understand? you reduce it to O(1) by not giving correct count? This problem is clear that it cannot be better than O(n^2) with an appropriate [low, high]. So yesterday, when everything was not so clear, we try to give a solution, and now thing changed, and what is your point? – Pham Trung Jul 04 '14 at 09:00
  • @PhamTrung Yeah, I realized that this code is wrong if numbers are both positive and negative. – user1071840 Jul 04 '14 at 23:50

7 Answers7

18

O(n) time solution:

You can extend the 'two pointer' idea for the 'exact' version of the problem. We will maintain variables a and b such that all intervals on the form xs[i,a), xs[i,a+1), ..., xs[i,b-1) have a sum in the sought after range [lo, hi].

a, b = 0, 0
for i in range(n):
    while a != (n+1) and sum(xs[i:a]) < lo:
        a += 1
    while b != (n+1) and sum(xs[i:b]) <= hi:
        b += 1
    for j in range(a, b):
        print(xs[i:j])

This is actually O(n^2) because of the sum, but we can easily fix that by first calculating the prefix sums ps such that ps[i] = sum(xs[:i]). Then sum(xs[i:j]) is simply ps[j]-ps[i].

Here is an example of running the above code on [2, 5, 1, 1, 2, 2, 3, 4, 8, 2] with [lo, hi] = [3, 6]:

[5]
[5, 1]
[1, 1, 2]
[1, 1, 2, 2]
[1, 2]
[1, 2, 2]
[2, 2]
[2, 3]
[3]
[4]

This runs in time O(n + t), where t is the size of the output. As some have noticed, the output can be as large as t = n^2, namely if all contiguous subsequences are matched.

If we allow writing the output in a compressed format (output pairs a,b of which all subsequences are contiguous) we can get a pure O(n) time algorithm.

spuleri
  • 515
  • 1
  • 6
  • 9
Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114
  • 1
    I think it is actually possible to solve it even with **O(1)** space. Instead of computing an array of prefix sum, we can maintain only two sums, `sum(xs[i:a])` and `sum(xs[i:b])`. When the start position moves, i.e. `i` increments, just subtract the the value from the two sums. – wlnirvana Sep 10 '16 at 18:13
  • @RameshwarBhaskaran Unfortunately yes. With negative numbers we no longer have that the sequence is guaranteed to increase with b and decrease with a. – Thomas Ahle Nov 02 '16 at 12:56
  • 1
    Can you explain the intuition behind this solution? – Huey Aug 01 '17 at 00:11
  • Will not work with **negative and zero** numbers. Ex : {5, 10, 2, 3, 5, -5} for range [15, 20]. Sum of all elements in array equals to 20 but will not get captured by your algo. You algorithm will work fine with positive integers though. – Saurav Sahu Sep 09 '20 at 13:26
8

Starting from this problem: find all contiguous sub-sequences that sum to x. What we need is something similar.

For every index i, we can calculate the sum of the segment from 0 to i, which is x. So, the problem now is we need to find from 0 to i - 1, how many segments have sum from (x - low) to (x - high), and it should be faster than O(n). So there are several data structures help you to do that in O(logn), which are Fenwick tree and Interval tree.

So what we need to do is:

  • Iterating through all index from 0 to n (n is the size of the array).

  • At index ith, calculate, starting from 0 to ith index, the sum x, query the tree to get the total occurrences of numbers fall in the range (x - high, x - low).

  • Add x to the tree.

So the time complexity will be O(n log n)

Community
  • 1
  • 1
Pham Trung
  • 11,204
  • 2
  • 24
  • 43
5

You should use a simple dynamic programming and binary search. To find the count:

    from bisect import bisect_left, bisect_right

    def solve(A, start, end):
        """
        O(n lg n) Binary Search
        Bound:
        f[i] - f[j] = start
        f[i] - f[j'] = end
        start < end
        f[j] > f[j']

        :param A: an integer array
        :param start: lower bound
        :param end: upper bound 
        :return:
        """
        n = len(A)
        cnt = 0
        f = [0 for _ in xrange(n+1)]

        for i in xrange(1, n+1):
            f[i] = f[i-1]+A[i-1]  # sum from left

        f.sort()
        for i in xrange(n+1):
            lo = bisect_left(f, f[i]-end, 0, i)
            hi = bisect_right(f, f[i]-start, 0, i)
            cnt += hi-lo

        return cnt

https://github.com/algorhythms/LintCode/blob/master/Subarray%20Sum%20II.py

To find the results rather the count, you just need another hash table to store the mapping from original (not sorted) f[i] -> list of indexes.

Cheers.

Daniel
  • 981
  • 3
  • 11
  • 15
  • Good solution! just f might not need to sort – spiralmoon Jul 12 '15 at 01:36
  • It the array contains negative number, f need to be sorted. – Daniel Jan 11 '16 at 04:28
  • @ThinkRecursively if the array includes non-negative numbers, you just the sum array is monotone, and doesn't need sorting for binary search to work, but if it includes negative numbers, the value might drop and it is not monotone, so you need to sort, but I'm not sure that whether the rest of algorithm works for negative numbers or not. – FazeL Jan 11 '16 at 10:10
  • 1
    It doesnt work when array contains negative numbers. For example consider [2,-1] with low=-1 and high=0. There is one subsequence (1,1) with sum -1 but the above algorithm will return 0. – Satvik Jan 30 '16 at 11:32
  • @Satvik if the algorithm does not work with negative numbers. Why the sort is needed? – Daniele Jul 04 '16 at 13:28
0

Here is way you can get O(nlogn) if there are only positive numbers :-

1. Evaluate cumulative sum of array
2. for i  find total sum[j] in (sum[i]+low,sum[i]+high) using binary search
3. Total = Total + count
4. do 3 to 5 for all i

Time complexity:-

Cumulative sum is O(N)
Finding sums in range is O(logN) using binary search
Total Time complexity is O(NlogN)
Vikram Bhat
  • 6,106
  • 3
  • 20
  • 19
0

If all integers are non-negative, then it can be done in O(max(size-of-input,size-of-output)) time. This is optimal.

Here's the algorithm in C.

void interview_question (int* a, int N, int lo, int hi)
{
  int sum_bottom_low = 0, sum_bottom_high = 0,
      bottom_low = 0, bottom_high = 0,
      top = 0;
  int i;

  if (lo == 0) printf ("[0 0) ");
  while (top < N)
  {
    sum_bottom_low += a[top];
    sum_bottom_high += a[top];
    top++;
    while (sum_bottom_high >= lo && bottom_high <= top)
    {
      sum_bottom_high -= a[bottom_high++];
    }
    while (sum_bottom_low > hi && bottom_low <= bottom_high)
    {
      sum_bottom_low -= a[bottom_low++];
    }
    // print output
    for (i = bottom_low; i < bottom_high; ++i)
      printf ("[%d %d) ", i, top);
  }
  printf("\n");
}

Except for the last loop marked "print output", each operation is executed O(N) times; the last loop is executed once for each interval printed. If we only need to count the intervals and not print them, the entire algorithm becomes O(N).

If negative numbers are allowed, then O(N^2) is hard to beat (might be impossible).

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
0
yes in my opinion it can be in O(n)

struct subsequence
{
int first,last,sum;
}s;

function(array,low,high)
{
int till_max=0;
s.first=0;s.last=0;s.sum=0;
for(i=low;i<high;i++)
{

if(till_max+array[i]>array[i])
{
s.first=s.first;
s.last=i;
till_max+=array[i];
}
else
{
s.first=i;
s.last=i;
till_max=array[i];
}
if(till_max in range)
{
s.sum=till_max;
   printf("print values between first=%d and last=%d and sum=%d",s.first,s.last,s.sum);
}
}
}
0

O(NlogN) with simple data structures is sufficient.

For contiguous subsequences, I think it means for subarrays.

We maintain a prefix sum list, prefix[i] = sum for the first i elements. How to check if there exists a range rum between [low, high]? We can use binary search. So,

prefix[0] = array[0]  
for i in range(1, N) 
  prefix[i] = array[i] + prefix[i-1];
  idx1 = binarySearch(prefix, prefix[i] - low);
  if (idx1 < 0) idx1 = -1 - idx1;
  idx2 = binarySearch(prefix, prefix[i] - high);
  if (idx2 < 0) idx2 = -1 - idx2;
  // for any k between [idx1, idx2], range [k, i] is within range [low, high]
  insert(prefix, prefix[i])

The only thing we need to care is we also need to insert new values, thus any array or linked list is NOT okay. We can use a TreeSet, or implement your own AVL trees, both binary search and insertion would be in O(logN).

Harry
  • 31
  • 3