0

I have the following main function:

public static void main(String[] args) {
        
        int times = 100;
        
        double startTime = System.nanoTime();
        for (int a = 1; a <= times; a++) {
            BTree<Integer> testTree = new BTree<Integer>(1,3); // Creates BTree with root 1 and degree T 3
            for (int i = 2; i <= 500; i++) {
                testTree.BTreeInsert(i); // We insert up to 500 keys.
            }
        }
        double stopTime = System.nanoTime();
        
        System.out.println("Time: " + (((stopTime-startTime) / 1000000 / times)) + "ms");
    }

What I'm trying to do here is measure the execution time for inserting a new key into a B-Tree up to 500 times. The problem is the time measurement I get is somewhat small, so my teacher told me to loop it a number of times.

However, the results I'm getting don't seem to add up, with this configuration the program returns Time: 0.167(..) ms, meanwhile with times = 1 the code returns Time: 2.1401 ms.

I feel like I'm missing out on something important, but I can't figure out what.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Sanzor
  • 442
  • 3
  • 11
  • What is `BTree` defined as? Are you sure it has no bugs and that the compiler optimizer isn't getting in the way? – ggorlen Jan 21 '21 at 23:49
  • 2
    Well, you divide the time by `times` when you print it out. So this is telling you that when you do this a bunch of times in a row, it's faster. This makes sense, because there is a lot of overhead in preparing the JVM – Charlie Armstrong Jan 21 '21 at 23:49
  • @CharlieArmstrong , I see. I don't know how to feel about it since I'm trying to find its time execution without any kind of overhead but at least now I know what's going on. Thank you! – Sanzor Jan 21 '21 at 23:54
  • 1
    @Sanzor Testing for execution time is a difficult task to do properly, especially in a high-level language like Java. If you're interested, [here's](https://stackoverflow.com/questions/21947452/why-is-printing-b-dramatically-slower-than-printing) an even harder to diagnose example. There are several Java libraries out there you can use which will help you minimize this kind of interference as much as possible. If this is a school project, though, I can only imagine your teacher is planning to make a point on this, and they likely want you to find this discrepancy. – Charlie Armstrong Jan 21 '21 at 23:59

2 Answers2

3

That's because JVM sees that the loop runs multiple times (profiling) and uses optimising compilers (C1 and C2 if that's a HotSpot JVM). If you run your class with -XX:+PrintCompilation (like java -XX:+PrintCompilation ClassName) you can see that in the version with 500 trials, more methods are compiled - for example, my OpenJDK 15.0.1 java compiles ArrayList::add several times, with C1 and afterwards, C2 (2 and 3 in the third column are different optimisation levels of C1, 4 is C2):

 45   48       3       java.util.ArrayList::add (25 bytes)
 46   51       4       java.lang.Number::<init> (5 bytes)
 46   49       2       java.util.ArrayList::add (23 bytes)
 47   46       3       java.lang.Number::<init> (5 bytes)   made not entrant
 47   52       4       java.util.ArrayList::add (25 bytes)

So, when the loop with 500 iterations runs, after several iterations, it uses compiled and highly optimised code, while the single iteration version leaves it to interpreter, which is slower.

Also, I would suggest looking into JMH, which is a tool used to doing exactly what you want to do here - microbenchmarking.

Andrew Vershinin
  • 1,958
  • 11
  • 16
0

Also bear in mind that "the B-tree [data structure] itself" is also a wild-card in this timing picture ... and that it's necessarily affected by "the data itself." Anytime you insert or delete records in such a structure, the amount of time required to do so can vary considerably!

B-tree index data structures are designed to optimize the retrieval of items by-key. Therefore, they seek to maintain data-structures that are "fairly balanced," so that retrieval time remains fairly consistent over time. Well, the price that they pay, in order to do this, comes in the processes of insertion and deletion – which are presumed to occur much less often than retrieval.

Mike Robinson
  • 8,490
  • 5
  • 28
  • 41