Google Interview: Find all contiguous subsequence in a given array of integers, whose sum falls in the given range. Can we do better than O(n^2)?

Question

Given an array of Integers, and a range (low, high), find all contiguous subsequence in the array which have sum in the range.

Is there a solution better than O(n^2)?

I tried a lot but couldn't find a solution that does better than O(n^2). Please help me find a better solution or confirm that this is the best we can do.

This is what I have right now, I'm assuming the range to be defined as [lo, hi].

public static int numOfCombinations(final int[] data, final int lo, final int hi, int beg, int end) {
    int count = 0, sum = data[beg];

    while (beg < data.length && end < data.length) {
       if (sum > hi) {
          break;
       } else {
          if (lo <= sum && sum <= hi) {
            System.out.println("Range found: [" + beg + ", " + end + "]");
            ++count;
          }
          ++end;
          if (end < data.length) {
             sum += data[end];
          }
       }
    }
    return count;
}

public static int numOfCombinations(final int[] data, final int lo, final int hi) {
    int count = 0;

    for (int i = 0; i < data.length; ++i) {
        count += numOfCombinations(data, lo, hi, i, i);
    }

    return count;
}

Does `sum > hi .. break` assume that integers are non-negative? (Otherwise, why to break if the sum can decrease as we continue.) — AlexD, Jul 03 '14 at 01:50
A [Segment Tree](http://en.wikipedia.org/wiki/Segment_tree) can be rather helpful for an output sensitive approach, though thinking it's still O(N^2) in the worst case where you return all continuous subsequences. — Nuclearman, Jul 03 '14 at 02:07
Given an array of all zeroes and the range (-1, 1), there are O(n^2) solutions, and you clearly require O(n^2) time just to print the answers. — Raymond Chen, Jul 03 '14 at 02:36
Can all integers only be positive? or can be positive or negative? — notbad, Jul 03 '14 at 03:08
@PhamTrung I saw this question posted on a site and it said 'find' all continuous sub-sequences. I'm doing both to test my result. — user1071840, Jul 03 '14 at 16:26
If item can be positive or negative so your code is wrong, as AlexD has mentioned. — Pham Trung, Jul 03 '14 at 16:38
@Pham Trung: It's obviously possible to write code that doesn't run in O (n^2) if you give up the requirement that it solves the problem asked. I can reduce it to O (1) by not giving the correct count. — gnasher729, Jul 04 '14 at 08:54
@gnasher729 I don't understand? you reduce it to O(1) by not giving correct count? This problem is clear that it cannot be better than O(n^2) with an appropriate [low, high]. So yesterday, when everything was not so clear, we try to give a solution, and now thing changed, and what is your point? — Pham Trung, Jul 04 '14 at 09:00
@PhamTrung Yeah, I realized that this code is wrong if numbers are both positive and negative. — user1071840, Jul 04 '14 at 23:50

score 18 · Answer 1 · edited Aug 07 '16 at 21:14

18

O(n) time solution:

You can extend the 'two pointer' idea for the 'exact' version of the problem. We will maintain variables a and b such that all intervals on the form xs[i,a), xs[i,a+1), ..., xs[i,b-1) have a sum in the sought after range [lo, hi].

a, b = 0, 0
for i in range(n):
    while a != (n+1) and sum(xs[i:a]) < lo:
        a += 1
    while b != (n+1) and sum(xs[i:b]) <= hi:
        b += 1
    for j in range(a, b):
        print(xs[i:j])

This is actually O(n^2) because of the sum, but we can easily fix that by first calculating the prefix sums ps such that ps[i] = sum(xs[:i]). Then sum(xs[i:j]) is simply ps[j]-ps[i].

Here is an example of running the above code on [2, 5, 1, 1, 2, 2, 3, 4, 8, 2] with [lo, hi] = [3, 6]:

[5]
[5, 1]
[1, 1, 2]
[1, 1, 2, 2]
[1, 2]
[1, 2, 2]
[2, 2]
[2, 3]
[3]
[4]

This runs in time O(n + t), where t is the size of the output. As some have noticed, the output can be as large as t = n^2, namely if all contiguous subsequences are matched.

If we allow writing the output in a compressed format (output pairs a,b of which all subsequences are contiguous) we can get a pure O(n) time algorithm.

edited Aug 07 '16 at 21:14

spuleri

515
1
6
9

answered Aug 16 '14 at 15:17

Thomas Ahle

30,774
21
92
114

1

I think it is actually possible to solve it even with **O(1)** space. Instead of computing an array of prefix sum, we can maintain only two sums, `sum(xs[i:a])` and `sum(xs[i:b])`. When the start position moves, i.e. `i` increments, just subtract the the value from the two sums. – wlnirvana Sep 10 '16 at 18:13
@RameshwarBhaskaran Unfortunately yes. With negative numbers we no longer have that the sequence is guaranteed to increase with b and decrease with a. – Thomas Ahle Nov 02 '16 at 12:56
1

Can you explain the intuition behind this solution? – Huey Aug 01 '17 at 00:11
Will not work with **negative and zero** numbers. Ex : {5, 10, 2, 3, 5, -5} for range [15, 20]. Sum of all elements in array equals to 20 but will not get captured by your algo. You algorithm will work fine with positive integers though. – Saurav Sahu Sep 09 '20 at 13:26

score 8 · Answer 2 · edited May 23 '17 at 12:10

8

Starting from this problem: find all contiguous sub-sequences that sum to x. What we need is something similar.

For every index i, we can calculate the sum of the segment from 0 to i, which is x. So, the problem now is we need to find from 0 to i - 1, how many segments have sum from (x - low) to (x - high), and it should be faster than O(n). So there are several data structures help you to do that in O(logn), which are Fenwick tree and Interval tree.

So what we need to do is:

Iterating through all index from 0 to n (n is the size of the array).
At index ith, calculate, starting from 0 to ith index, the sum x, query the tree to get the total occurrences of numbers fall in the range (x - high, x - low).
Add x to the tree.

So the time complexity will be O(n log n)

edited May 23 '17 at 12:10

Community

1
1

answered Jul 03 '14 at 02:11

Pham Trung

11,204
2
24
43

An interval tree and a segment tree are two different things. – John Kurlak Jul 09 '16 at 21:56
1

An interval tree is not what you think it is. The data structures that support the operation you desire are Fenwick trees and Segment trees. – John Kurlak Jul 21 '16 at 16:57

score 5 · Answer 3 · answered May 31 '15 at 15:18

5

You should use a simple dynamic programming and binary search. To find the count:

    from bisect import bisect_left, bisect_right

    def solve(A, start, end):
        """
        O(n lg n) Binary Search
        Bound:
        f[i] - f[j] = start
        f[i] - f[j'] = end
        start < end
        f[j] > f[j']

        :param A: an integer array
        :param start: lower bound
        :param end: upper bound 
        :return:
        """
        n = len(A)
        cnt = 0
        f = [0 for _ in xrange(n+1)]

        for i in xrange(1, n+1):
            f[i] = f[i-1]+A[i-1]  # sum from left

        f.sort()
        for i in xrange(n+1):
            lo = bisect_left(f, f[i]-end, 0, i)
            hi = bisect_right(f, f[i]-start, 0, i)
            cnt += hi-lo

        return cnt

https://github.com/algorhythms/LintCode/blob/master/Subarray%20Sum%20II.py

To find the results rather the count, you just need another hash table to store the mapping from original (not sorted) f[i] -> list of indexes.

Cheers.

answered May 31 '15 at 15:18

Daniel

981
3
11
15

Good solution! just f might not need to sort – spiralmoon Jul 12 '15 at 01:36
It the array contains negative number, f need to be sorted. – Daniel Jan 11 '16 at 04:28
@ThinkRecursively if the array includes non-negative numbers, you just the sum array is monotone, and doesn't need sorting for binary search to work, but if it includes negative numbers, the value might drop and it is not monotone, so you need to sort, but I'm not sure that whether the rest of algorithm works for negative numbers or not. – FazeL Jan 11 '16 at 10:10
1

It doesnt work when array contains negative numbers. For example consider [2,-1] with low=-1 and high=0. There is one subsequence (1,1) with sum -1 but the above algorithm will return 0. – Satvik Jan 30 '16 at 11:32
@Satvik if the algorithm does not work with negative numbers. Why the sort is needed? – Daniele Jul 04 '16 at 13:28

Vikram Bhat · Answer 4 · 2014-07-03T06:07:00.997

0

Here is way you can get O(nlogn) if there are only positive numbers :-

1. Evaluate cumulative sum of array
2. for i  find total sum[j] in (sum[i]+low,sum[i]+high) using binary search
3. Total = Total + count
4. do 3 to 5 for all i

Time complexity:-

Cumulative sum is O(N)
Finding sums in range is O(logN) using binary search
Total Time complexity is O(NlogN)

edited Jul 03 '14 at 06:07

answered Jul 03 '14 at 04:57

Vikram Bhat

6,106
3
20
19

Binary search? the cumulative sum may not be in sorted order? – Pham Trung Jul 03 '14 at 06:30
@PhamTrung it is only for positive integers please check – Vikram Bhat Jul 03 '14 at 06:32

n. m. could be an AI · Answer 5 · 2014-07-03T10:09:37.127

If all integers are non-negative, then it can be done in O(max(size-of-input,size-of-output)) time. This is optimal.

Here's the algorithm in C.

void interview_question (int* a, int N, int lo, int hi)
{
  int sum_bottom_low = 0, sum_bottom_high = 0,
      bottom_low = 0, bottom_high = 0,
      top = 0;
  int i;

  if (lo == 0) printf ("[0 0) ");
  while (top < N)
  {
    sum_bottom_low += a[top];
    sum_bottom_high += a[top];
    top++;
    while (sum_bottom_high >= lo && bottom_high <= top)
    {
      sum_bottom_high -= a[bottom_high++];
    }
    while (sum_bottom_low > hi && bottom_low <= bottom_high)
    {
      sum_bottom_low -= a[bottom_low++];
    }
    // print output
    for (i = bottom_low; i < bottom_high; ++i)
      printf ("[%d %d) ", i, top);
  }
  printf("\n");
}

Except for the last loop marked "print output", each operation is executed O(N) times; the last loop is executed once for each interval printed. If we only need to count the intervals and not print them, the entire algorithm becomes O(N).

If negative numbers are allowed, then O(N^2) is hard to beat (might be impossible).

score 0 · Answer 6 · answered Mar 28 '15 at 17:15

yes in my opinion it can be in O(n)

struct subsequence
{
int first,last,sum;
}s;

function(array,low,high)
{
int till_max=0;
s.first=0;s.last=0;s.sum=0;
for(i=low;i<high;i++)
{

if(till_max+array[i]>array[i])
{
s.first=s.first;
s.last=i;
till_max+=array[i];
}
else
{
s.first=i;
s.last=i;
till_max=array[i];
}
if(till_max in range)
{
s.sum=till_max;
   printf("print values between first=%d and last=%d and sum=%d",s.first,s.last,s.sum);
}
}
}

score 0 · Answer 7 · answered Jan 03 '19 at 20:51

O(NlogN) with simple data structures is sufficient.

For contiguous subsequences, I think it means for subarrays.

We maintain a prefix sum list, prefix[i] = sum for the first i elements. How to check if there exists a range rum between [low, high]? We can use binary search. So,

prefix[0] = array[0]  
for i in range(1, N) 
  prefix[i] = array[i] + prefix[i-1];
  idx1 = binarySearch(prefix, prefix[i] - low);
  if (idx1 < 0) idx1 = -1 - idx1;
  idx2 = binarySearch(prefix, prefix[i] - high);
  if (idx2 < 0) idx2 = -1 - idx2;
  // for any k between [idx1, idx2], range [k, i] is within range [low, high]
  insert(prefix, prefix[i])

The only thing we need to care is we also need to insert new values, thus any array or linked list is NOT okay. We can use a TreeSet, or implement your own AVL trees, both binary search and insertion would be in O(logN).

Google Interview: Find all contiguous subsequence in a given array of integers, whose sum falls in the given range. Can we do better than O(n^2)?

7 Answers7

Linked