Point out potential overflow bugs

Question

Here's my solution to interviewbit problem. link

You are given a read only array of n integers from 1 to n. Each integer appears exactly once except A which appears twice and B which is missing. Return A and B. Note: Your algorithm should have a linear runtime complexity. Could you implement it without using extra memory? Note that in your output A should precede B. N <= 10^5

It looks like there's an overflow problems somewhere. Could you point out such places and suggest fixes.

  typedef long long int unit;

vector<int> Solution::repeatedNumber(const vector<int> &A) {
    unit n = A.size();
    unit sum = n*(n+1)/2;
    unit sumsq = n*(n+1)*(2*n+1)/6;
    unit arrsum = std::accumulate(A.begin(), A.end(), 0);


    unit arrsq = 0;
    for(int item : A) {
        arrsq += (unit)item*item;
    }

    unit c1 = arrsum - sum;

    unit c2 = arrsq - sumsq;

    unit a = (c2/c1 + c1);
    a/=2;

    unit b = (c2/c1 - c1);
    b/=2;

    return {a, b};
}

P.S It gotta be overflow problem because the same solution works in Python.

Update Here's solution provided by authors of a problem. It's interesting how he fights the overflow problem in summation by subtracting.

 class Solution {
public:
    vector<int> repeatedNumber(const vector<int> &V) {
       long long sum = 0;
       long long squareSum = 0;
       long long temp;
       for (int i = 0; i < V.size(); i++) {
           temp = V[i];
           sum += temp;
           sum -= (i + 1);
           squareSum += (temp * temp);
           squareSum -= ((long long)(i + 1) * (long long)(i + 1));
       }
       // sum = A - B
       // squareSum = A^2 - B^2 = (A - B)(A + B)
       // squareSum / sum = A + B
       squareSum /= sum;

       // Now we have A + B and A - B. Lets figure out A and B now. 
       int A = (int) ((sum + squareSum) / 2);
       int B = squareSum - A;

       vector<int> ret;
       ret.push_back(A);
       ret.push_back(B);
       return ret;
    }
};

Could you please remove the comments? They are not causing the problem but they make the code less readable. Also, I wonder how strict "no additional memory" is meant, as you are using extra memory. — 463035818_is_not_an_ai, Jun 07 '15 at 11:34
@tobi303, it means assymptotically used memory should be O(1)(Shouldn't depend on the input length). — Dmitry S., Jun 07 '15 at 12:16
I think it should be b=(c1-c2/c1)/2. your solution outputs -b instead of b — Ophir Gvirtzer, Jun 07 '15 at 13:26
@OphirGvirtzer, I believe it's correct as far as formulas go. It calculates the result correctly. — Dmitry S., Jun 07 '15 at 13:55

IVlad · Answer 1 · 2015-06-07T13:31:20.597

2

The problem is this:

unit arrsum = std::accumulate(A.begin(), A.end(), 0);

You need to use 0LL to make it accumulate the values as long long.

Code that demonstrates the problem:

int main()
{
    vector<int> A;
    for (int i = 0; i < 1000000; ++i)
        A.push_back(1000000);

    long long arrsum = accumulate(A.begin(), A.end(), 0LL);
    cout << arrsum;

    return 0;
}

Outputs -727379968 without the LL and the correct result with it.

Note that you can also use accumulate to compute the sum of squares:

unit arrsq = accumulate(A.begin(), A.end(), 0LL, 
                             [](unit x, unit y) { return x + y*y; });

edited Jun 07 '15 at 13:31

answered Jun 07 '15 at 13:26

IVlad

43,099
13
111
179

This is a horrible example of how C++ can shoot an innocent programmer in the leg. I hope that the compiler at least generated a warning. – Ophir Gvirtzer Jun 07 '15 at 14:04
@OphirGvirtzer - why? I use C++ very little but it immediately jumped at me that he was using `accumulate` on `int`s. What else could it return other than an `int`? Making the initial value `0LL` might not be so intuitive, but changing the initial array to `long long` is intuitive and would have worked too. – IVlad Jun 07 '15 at 14:14
2

@lVlad You're right on this example. But the same happens when the array is long long if not using 0LL, I consider this quite terrible (and VC++ doesn't warn). – Ophir Gvirtzer Jun 07 '15 at 14:20
@OphirGvirtzer oh, I thought it would work with just long long... that is pretty weird indeed. – IVlad Jun 07 '15 at 14:25
Only a few can understand the rules for template type deduction... – Ophir Gvirtzer Jun 07 '15 at 14:28

score 0 · Answer 2 · answered Jun 07 '15 at 11:36

0

The potential overflow problems are:

unit sum = n*(n+1)/2;

here the maximum n value is 10^5. Hence, n*(n+1) will yield 10^10 and then computes the division due to operator precedence.

The second place is

unit sum = n*(n+1)(2*n+1)/6;

the intermediate value computed here goes upto 10^15.

Also there is integer overflow in the where you are computing the sum of squares of all the numbers.

answered Jun 07 '15 at 11:36

Nivetha

698
5
17

`unit` is typedef-ed to `long long`, so those shouldn't overflow. – IVlad Jun 07 '15 at 13:19
There sholdn't be an overflow in these 3 expressions. long long int is defined "Not smaller than long. At least 64 bits." a 64 signed bit can hold 10^18.9, while (n+1)*(n+1)(2n+1) is 10^16 by the worst case. – Ophir Gvirtzer Jun 07 '15 at 13:23

Point out potential overflow bugs

2 Answers2