23

My mini benchmark:

import java.math.*;
import java.util.*;
import java.io.*;
public class c
{
    static Random rnd = new Random();
    public static String addDigits(String a, int n)
    {
        if(a==null) return null;
        if(n<=0) return a;
        for(int i=0; i<n; i++)
            a+=rnd.nextInt(10);
        return a;
    }
    public static void main(String[] args) throws IOException
    {
        int n = 10000; \\number of iterations
        int k = 10;    \\number of digits added at each iteration

        BigInteger a;
        BigInteger b;

        String as = "";
        String bs = "";
        as += rnd.nextInt(9)+1;
        bs += rnd.nextInt(9)+1;
        a = new BigInteger(as);
        b = new BigInteger(bs);
        FileWriter fw = new FileWriter("c.txt");
        long t1 = System.nanoTime();
        a.multiply(b);
        long t2 = System.nanoTime();
        //fw.write("1,"+(t2-t1)+"\n");
        if(k>0) {
            as = addDigits(as, k-1);
            bs = addDigits(as, k-1);
        }
        for(int i=0; i<n; i++)
        {
            a = new BigInteger(as);
            b = new BigInteger(bs);
            t1 = System.nanoTime();
            a.multiply(b);
            t2 = System.nanoTime();
            fw.write(((i+1)*k)+","+(t2-t1)+"\n");
            if(i < n-1)
            {
                as = addDigits(as, k);
                bs = addDigits(as, k);
            }
            System.out.println((i+1)*k);
        }       

        fw.close();
    }
}

It measures multiplication time of n-digit BigInteger

Result: enter image description here

You can easily see the trend but why there is so big noise above 50000 digits? It is because of garbage collector or is there something else that affects my results? When performing the test, there were no other applications running.

Result from test with only odd digits. The test was shorter (n=1000, k=100)

enter image description here

Odd digits (n=10000, k=10) enter image description here

As you can see there is a huge noise between 65000 and 70000. I wonder why...

Odd digits (n=10000, k=10), System.gc() every 1000 iterations enter image description here Results in noise between 50000-70000

Jan Ajan
  • 1,461
  • 3
  • 12
  • 15
  • 3
    There is no way to know with the info you have shown. Why don't you try a ` System.gc()` every 500-1000 iterations to see if that smooths it out. – hvgotcodes May 24 '12 at 19:54
  • "There were no other applications running." Are you on a real-time computing environment without a time-sharing operating system (like Windows)? It's impossible to guarantee every CPU cycle is being allocated only to your application. – mellamokb May 24 '12 at 19:55
  • 1
    Sry for getting off topic but, what program are you plotting with? – keyser May 24 '12 at 19:57
  • "There were no other applications running." equals "There were no other applications running except of an operating system (and system services etc)." – Jan Ajan May 24 '12 at 19:57
  • @Keyser, I use Graph from http://padowan.dk – Jan Ajan May 24 '12 at 19:57
  • @ulu5 do you mead http://en.wikipedia.org/wiki/Karatsuba_algorithm ? – Jan Ajan May 24 '12 at 19:59
  • 2
    @ulu5 your link, consistent with the source code, says that JDK *does not* change algorithms. – Affe May 24 '12 at 20:07
  • BTW that link calls O(nm) *exponential*?? – Marko Topolnik May 24 '12 at 20:09
  • Java(TM) SE Runtime Environment (build 1.7.0_02-b13) Java HotSpot(TM) 64-Bit Server VM (build 22.0-b10, mixed mode) – Jan Ajan May 24 '12 at 20:14
  • @ulu5, I'll try putting only odd digits into BigInteger but full test will take some time. I'll post results later. – Jan Ajan May 24 '12 at 20:19
  • 1
    It is the gc, that the effect is "slowing down" is probably an effect of the jvm "warming up". Call System.gc() outside of the timing block every now and then to confirm. – esej May 24 '12 at 20:47
  • 1
    @hvgotcodes @esej From the documentation of [System.gc()](http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#gc()) "Calling the gc method _suggests_ that the Java Virtual Machine expend effort toward recycling unused objects [...]". In other words, `System.gc()`is just a hint to the JVM, there is no way of knowing if, when or how the GC actually executes. – matsev May 24 '12 at 20:57
  • @matsev Not gonna argue, but there are ways to know. Otherwise correct. It is fully possible such a test would provide the answer to this question though. – esej May 24 '12 at 21:02
  • @matsev You can just verify it with VisualVM. – Andrew May 24 '12 at 21:03
  • @matsev The doc of `gc` basically says "I'll probably do a gc, just don't take my word for it", it doesn't say "You just call me all you want, I'll be happy to ignore you" :) When I played with it, running gc() twice in a row reliably resized the heap to only the live objects -- and that's all it takes to make the test results correct. – Marko Topolnik May 24 '12 at 21:13
  • @Andrew Well, you can verify the result of what happend with aid of VisualVM. However, when you write your code, you cannot predict if the GC will execute before the next statement is executed, when an object has lost its last reference or when the JVM shuts down. – matsev May 24 '12 at 21:16
  • @esej I guess by "ways to know" you mean "what happend"? I agree that you can monitor what happend (e.g. by using VisualVM) and that may be enough to solve the problem – matsev May 24 '12 at 21:43
  • @Marko I like your quote "I'll probably do a gc, just don't take my word for it", that was basically the key message in my original comment. It's good that you found a solution to your problem, however it may or may not behave in the same way if you use a JVM developed by another vendor. In fact, different JVMs from the same vendor may behave differently, e.g. Oracle changed their GC implementation in the recent [JDK 7 update 4 release](http://docs.oracle.com/javase/7/docs/technotes/guides/vm/G1.html). – matsev May 24 '12 at 21:44
  • Remember that via the GarbageCollectorMXBean class you can glean information about whether GC took place. It doesn't necessarily have to be a mystery. – Neil Coffey May 24 '12 at 21:50
  • 1
    You should obviously average over several measurements for each data point when taking this kind of benchmark-- I can't quite tell from your code and then subsequent priocessing if you're doing this. – Neil Coffey May 24 '12 at 21:58
  • I would guess that this specific range of numbers lines up with some specific GC threshold thing -- perhaps generating a specific amount of garbage that triggers GCish things? – Louis Wasserman May 25 '12 at 01:10
  • @MarkoTopolnik - *"When I played with it, running gc() twice in a row reliably resized the heap to only the live objects"*. The problem is that your experience CANNOT be generalized to other JVMs, or even to the same JVM with different GC options. – Stephen C May 25 '12 at 02:21
  • @StephenC That's not a problem, but a caveat. OP can convince himself for his concrete setup what works, and when it does work, well that's it then. I gave an example of how it worked for me. – Marko Topolnik May 25 '12 at 07:43
  • @matsev But you are testing on one concrete setup. After all, the results you are getting are just as specific to that setup as the behavior of gc. It's another thing how to convince yourself that gc() is doing a gc. I repeated the same memory allocation procedure, followed by two gc's, followed by Runtime.freeMemory, many times over, and the results were exactly the same every time. Without that check, however, you can't know anything in advance. – Marko Topolnik May 25 '12 at 07:52

2 Answers2

9

I also suspect this is a JVM warmup effect. Not warmup involving classloading or the JIT compiler, but warmup of the heap.

Put a (java) loop around the whole benchmark, and run it a number of times. (If this gives you the same graphs as before ... you will have evidence that this is not a warmup effect. Currently you don't have any empirical evidence one way or the other.)


Another possibility is that the noise is caused by your benchmark's interactions with the OS and/or other stuff running on the machine.

  • You are writing your timing data to an unbuffered stream. That means LOTS of syscalls, and (potentially) lots of fine-grained disc writes.
  • You are making LOTS of calls to nanoTime(), and that might introduce noise.
  • If something else is running on your machine (e.g. you are web browsing) that will slow down your benchmark for a bit and introduce noise.
  • There could be competition over physical memory ... if you've got too much running on your machine for the amount of RAM.

Finally, a certain amount of noise is inevitable, because each of those multiply calls generates garbage, and the garbage collector is going to need to work to deal with it.


Finally finally, if you manually run the garbage collector (or increase the heap size) to "smooth out" the data points, what you are actually doing is concealing one of the costs of multiply calls. The resulting graphs looks nice, but it is misleading:

  • The noisiness reflects what will happen in real life.
  • The true cost of the multiply actually includes the amortized cost of running the GC to deal with the garbage generated by the call.

To get a measurements that reflect the way that BigInteger behaves in real life, you need to run the test a large number of times, calculate average times and fit a curve to the average data-points.

Remember, the real aim of the game is to get scientifically valid results ... not a smooth curve.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
3

If you do a microbenchmark, you must "warm up" the JVM first to let the JIT optimize the code, and then you can measure the performance. Otherwise you are measuring the work done by the JIT and that can change the result on each run.

The "noise" happens probably because the cache of the CPU is exceeded and the performance starts degrading.

Emmanuel Bourg
  • 9,601
  • 3
  • 48
  • 76
  • 1
    OK, but case of not-warmed-up JVM is at the very beginning of simulation. Later, we can assume warmed-up state. But the noise is at the end of simulation, not at the beginning – Jan Ajan May 24 '12 at 22:20