3

For a project I worked on, I was tasked with timing the search times for two different search algorithms: binary search and sequential search. For each algorithm, I was supposed to record the time for both sorted input and unsorted input. I came across something weird when I compared the search times for sequential search on the sorted input vs. the unsorted input. Depending on which one I sort first, that search time will be significantly greater than than the second. So if I sequential search on the sorted first, it will take much longer than the sequential search on the unsorted.

This doesn't make sense to me and is the source of my confusion. The keys searched for are guaranteed to be found in the data input (by the sequential search), since the keys are taken from the input.

Here is the code that creates the problem. In this case, the seqOnUnsorted search times would be much greater than seqOnSorted, which it shouldn't be.

public void sequentialSearchExperiment(){
    seqOnUnsorted = sequentialSearchSet(keys, unsortedArray);
    writeOutExperimentResults(seqOnUnsorted, seqOnUnsortedFilename, "Sequential Sort on Unsorted: ");

    seqOnSorted = sequentialSearchSet(keys, sortedArray);
    writeOutExperimentResults(seqOnSorted, seqOnSortedFilename, "Sequential Sort on Sorted: ");

}

The sequentialSearchSet() method is as follows:

public SearchStats[] sequentialSearchSet(int[] keys, int[] toSearch){
    SearchStats[] stats = new SearchStats[keys.length];

    for (int i = 0; i < keys.length; i++){
        stats[i] = sequentialSearch(keys[i], toSearch);
    }

    return stats;
}

Here is sequentialSearch():

public SearchStats sequentialSearch(int key, int[] toSearch){

    long startTime = System.nanoTime(); // start timer

    // step through array one-by-one until key found
    for (int i = 0; i < toSearch.length; i++){
        if (toSearch[i] == key){
            return new SearchStats(key, i, System.nanoTime() - startTime);
        }
    }

    // did not find key
    return new SearchStats(key, -1, System.nanoTime() - startTime);
}

and here is the SearchStats constructor:

public SearchStats(int keySearchedFor, int indexOfFound, long searchTime){
    this.keySearchedFor = keySearchedFor;
    this.indexOfFound = indexOfFound;
    this.searchTime = searchTime;
}

If I do a test run, the average search times I get:

sequential search on sorted: 21,080 ns
sequential search on unsorted: 2,137,465 ns

As you can see, because I search on the unsorted first, the search time was significantly longer. Can anyone explain why this is so? And furthermore, how I could avoid such weirdness?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
jtan
  • 129
  • 8
  • 1
    Trying running the test repeatedly until you don't see any performance improvement. Typically a method/loop needs to be run 10,000x before it is fully optimised. Search for the `-XX:CompileThreshold=` option for more details. – Peter Lawrey Nov 07 '11 at 17:52

2 Answers2

9

This is due to the VM "warming up". As a brief summary, modern VMs compile common code paths to native code and optimise them as they're running. So the first few iterations around the loop, the code is being interpreted and is many orders of magnitude slower than the code once optimisation kicks in.

This is a common problem when profiling Java, and the general solution is to exercise the code under test a few (million) times before perfoming either measured test.

For more details and suggestions, you should read Anatomy of a flawed micro-benchmark.

Andrzej Doyle
  • 102,507
  • 33
  • 189
  • 228
  • 1
    +1. Additionally, it would be best not to time on a single loop run after warm-up, but to do multiple runs and take the average of the measurements. That should reduce the possible impact of stuff beyond your control like other processes in the OS taking precedence and hogging CPU time. Overall, it'd be better to check this kind of thing with a profiler so you only get method timings for time actually allotted to the JVM. – G_H Nov 07 '11 at 17:43
  • Thanks! That explains everything. – jtan Nov 07 '11 at 18:45
1

Microbenchmark is hard: http://www.ibm.com/developerworks/java/library/j-jtp02225/index.html

ptyx
  • 4,074
  • 1
  • 19
  • 21