28

I have the following code that returns the number of nodes in a tree when a complete Binary Tree is layer layers tall:

public static long nNodesUpToLayer(int layer) {
        if (layer < 0) throw new IllegalArgumentException(
            "The layer number must be positive: " + layer );

        //At layer 0, there must be 1 node; the root.
        if (layer == 0) return 1;

        //Else, there will be 1 + 2 * (the number of nodes in the previous layer) nodes.
        return 1 + (2 * nNodesUpToLayer(layer - 1));

The odd thing is, when I input 63 into the function (the minimum value that produces this), it gives me back -1. At 62, it gives back 9223372036854775807, so this seems to be caused by an overflow.

Shouldn't it give me back the minimum value of Java's long + the amount that it was overflowed by? Regardless of the input that I give it though (passed 62), it will always return -1 instead of a seemingly random number that I'd expect from an overflow.

I'm not entirely sure how to debug this, since it's recursive, and the value I'm interested in will only be evaluated once the the function has reached the base-case.

8bittree
  • 1,769
  • 2
  • 18
  • 25
Carcigenicate
  • 43,494
  • 9
  • 68
  • 117
  • 3
    I will just leave **[this](https://docs.oracle.com/javase/8/docs/api/java/math/BigInteger.html)** here and walk away... –  Jul 09 '15 at 20:36
  • 7
    @Snowman Thanks for the suggestion. Now I know that a tree of height `1000` will have `21430172143725346418968500981200036211228096234110672148875007767407021022498722449863967576313917162551893458351062936503742905713846280871969155149397149607869135549648461970842149210124742283755908364306092949967163882534797535118331087892154125829142392955373084335320859663305248773674411336138751` nodes. I'm actually amazed at how quick it was able to calculate that. I expected `BigInteger` to introduce massive overhead (or so I've heard), but that finished almost instantly. – Carcigenicate Jul 09 '15 at 23:10
  • BigDecimal has a reputation for being slower with certain types of math, but BigInteger generally performs well on modern hardware (amd64) with its gigantic caches and extra registers. (Yes I know BD uses BI internally, but that scale can really screw up performance for some combinations of numbers and mathematical operations). –  Jul 10 '15 at 00:57
  • 1
    @Snowman I tested the limits of the function as a whole, and it manages to handle heights up to ~9000 (which gives a stupidly huge result) before it gives a stack Overflow. Even at the size, it's still nearly instantaneous, which is even more amazing. I wish doing math with it wasn't so ugly though. I had to split the return over 3 lines to help readability. Java really needs operator overloading! – Carcigenicate Jul 10 '15 at 01:03
  • With static imports: `return ONE.add(TWO.multiply(nNodesUpToLayer(layer - 1)));` Not much more verbose. Actually, TWO is not a static constant on BigInteger. But you could make one and store it in your class and refer to it as TWO the same way. –  Jul 10 '15 at 01:11
  • It has static constants? Yes, that definitely helps. I should have read the docs more thoroughly. Thanks. – Carcigenicate Jul 10 '15 at 01:13
  • 3
    the function is technically `2^(layer+1) - 1`, so why do you write a recursive function instead of simply `return (1L << (layer+1)) - 1`? That's tens or hundreds of times faster. With that formula it's also easier to see why it overflows at layer = 63 – phuclv Jul 10 '15 at 01:22
  • @LưuVĩnhPhúc Because I figured out the function from looking at examples of trees; I didn't look up what it's supposed to be. Plus, honestly I find my version far more understandable that some bit-shifting concoction (plus, bit manipulation is something that I have yet to get into). And, speed really doesn't matter. Like I said above, it's nearly instantaneous for a tree with a height of 9000, so unless this is being run in a tight loop, it's run time shouldn't matter. – Carcigenicate Jul 10 '15 at 01:42
  • Mm, the bit shift is just to raise it to a power. So it's not that much of a "concoction". I still find my version better unless you require absolute speed and the ability to handle massive trees. – Carcigenicate Jul 10 '15 at 01:45
  • @Snowman Unfortunately, I can't use `BigInteger` in the end anyways. I need to use `log` on the result, and after tons of searching, outside of Guava (which I can't use), there doesn't appear to be any way to find the base-10 log of `BigInteger`. – Carcigenicate Jul 10 '15 at 02:27
  • 2
    As I said above it's really just `pow(2, layer) - 1` – phuclv Jul 10 '15 at 03:47
  • @LưuVĩnhPhúc Ya, I made a second comment after my first with the realization that that's what the shift was for. Please tell me you didn't downvote because I didn't like your suggestion. – Carcigenicate Jul 10 '15 at 03:49
  • 2
    No, there's no reason for me to downvote it. But personally to me a single-line expression is much more clearer than a long function – phuclv Jul 10 '15 at 04:06
  • 1
    So you want log_10(2^n -1) ? That's almost exactly n * log_10(2) – kevin cline Jul 10 '15 at 06:32
  • 1
    This is somewhat tangential, but when you are dealing with a puzzle that involves some specific but non-obvious number such as 9223372036854775807, looking at its binary representation may provide a clue as to what is going on (as may its interpretation as an ASCII or Unicode string, taking the endian-ness of your architecture into account.) – sdenham Jul 10 '15 at 12:48
  • @kevincline Unfortunately, I don't think "almost exactly" will work. If I switched to Luu's function, and used `BigInteger`, this will return numbers many times larger than the number I posted in the above comment. I can see the error multiplying pretty fast for numbers that big. – Carcigenicate Jul 10 '15 at 12:48
  • @Carcigenicate Consider this: [log(2^50)](https://www.google.com/search?q=log%282^50%29&oq=log%282^50%29&gs_l=serp.3..19j0i10i30l2j0i8i30l4.2477.2637.0.2891.2.2.0.0.0.0.76.151.2.2.0....0...1c.1.64.serp..0.2.151.Ia7WdNCpFgk) vs. [log(2^50-1)](https://www.google.com/search?q=log%282^50-1%29&oq=log%282^50-1%29&gs_l=serp.3..0i8i7i30.7580.7677.0.8248.2.2.0.0.0.0.93.183.2.2.0....0...1c.1.64.serp..0.2.182.tin-zg65IbI). Google gives exactly the same result. The difference is much smaller than the scale of the result, so the error just from floating point truncation is much larger than the difference. – jpmc26 Jul 10 '15 at 20:42
  • That may or may not be the case for your situation, but keep in mind that forcing yourself to include the `-1` isn't necessarily going to make your answer more accurate. You would have to check. Like it or not, you're probably dealing with inexact results just by using a computer at all. The question then is, "*How* accurate do you need your result to be and how do you get that level of accuracy with a computer, given the constraints of values you must work with?" That's the nature of math and computers, I'm afraid. – jpmc26 Jul 10 '15 at 20:43
  • @jpmc26 Thanks. I've actually finished the program now, but if I ever decide to come back to it to allow for larger input, I'll come back to your suggestion. – Carcigenicate Jul 10 '15 at 22:30

3 Answers3

57

You are correct, it is an overflow error of a 64 bit signed integer. The reason it goes to -1 instead of the minimum integer value is because you are doubling it, not simply adding one.

9223372036854775807 in Two's Complement is 63 1s:

0111 1111 ... 1111 1111

To double it in binary, simply perform a left shift:

1111 1111 ... 1111 1110

However, this number in Two's Complement is not twice 9223372036854775807, but rather -2. Then, of course, you add 1 to that before returning to get your -1 result.

8bittree
  • 1,769
  • 2
  • 18
  • 25
15

Actually, it is returning you the correct amount. It's just that "the amount that it was overflowed by" is exactly right to make the answer -1 :)

Consider this:
The number of nodes in a complete binary tree is 2^n - 1 for n layers. Hence its binary representation is 0000...00111...111 where the number of 1s is precisely the number of layers minus 1. As soon as you reach the length of the long you're stuck at the truncated 11...11, which is precisely -1

Ordous
  • 3,844
  • 15
  • 25
  • 1
    I like this answer because it makes it very clear how the initial -1 comes about, and incidentally that it is not dependent on the size of the integer type being used. Once -1 has been reached, however, the recursion is no longer calculating the size of the tree, so I don't think you can extend this argument to subsequent cases. Instead, in those cases, it is clear that the recursion is simply calculating 1 + ( 2 * -1 ) repeatedly. – sdenham Jul 10 '15 at 12:35
10

I always prefer the visualizations with things like this.

                      (min long)
                      v 
<--------------------||--------------------------------------->
                     ^                                   ^
               (max long, n)                            -1

Where n is 9223372036854775807 - the value you have right before you multiply by 2. Instead of multiplication, though, think of it as addition. n + n. By seeing it on a number line, you can see that you'd end up at -2. You're basically overflowing past most of the negative numbers.

So that my answer contributes something meaningful compared to the others, one helpful tool in situations like this is to break your arithmetic into multiple lines in order to debug. You could write:

int a = nNodesUpToLayer(layer - 1);
int b = 2 * a;
int c = 1 + b;
return c;

You're essentially enforcing order-of-operations as you'd expect it(which may help you realize the program is doing things out of the order you want them in), but it also lets you go into the debugger and see the intermediate values of your calculations. Here you would have noticed b == -2. Why is b == -2? Well, it must be because 2 * a == -2, etc.

Shaz
  • 1,376
  • 8
  • 18
  • I was clueless and couldn't understand anything from other answers, until you showed the visualization! Thank you! – Alexus Jul 09 '15 at 23:18