3

My program in a nutshell:

I have a program that successively runs several sort algorithms against an array of ints, timing each. The GUI allows the user to select array size and a variety of random number ranges with which to fill the array to be sorted. Each click of the "Sort" button grabs the user's array values, constructs a new array, then creates for each sort algorithm a clone of the array, using .clone().

The problem:

when "sort" button is clicked a second time the sorts improve themselves.

Somewhere there's optimization happening that I don't understand.

The reason this is an issue: if the user doesn't change their array settings and runs the sort methods again, it is true that a new array is constructed with new random numbers but the random number ranges remain the same so the run time, over a list of 125k, should remain about the same....not improve 300%.

So here is a demo program of what I am facing. It uses only one sort, Java's native sort to demonstrate the issue. It also uses hard coded values for constructing the random int array to be sorted - but does so with each "enter" press. I think this simulation accurately reflects my program because the same "error" is happening here, too.

Why is the sort faster the second time?

...the array is rebuilt with new values for each run, so how can it get faster?

package sortTooFast;

import java.util.Arrays;
import java.util.Scanner;

public class SortTooFast {
    public static final int ARRAY_SIZE = 500000;
    public static final int MIN_RANGE = 0;
    public static final int MAX_RANGE = 100;
    public static final int INCLUSIVE = 1;
    
    int[] sortingArray;

    public static void main(String[] args) {
        SortTooFast test = new SortTooFast();
        test.run();
    }
    
    // Run program.
    public void run(){
        while(true){    
            
            // Assign int[] filled with random numbers.
            sortingArray = getArray();                  

            // Inform user.
            System.out.println("\nPress return key to run sort!");
            
            // Wait for user.
            new Scanner(System.in).nextLine();
            
            System.out.println("First 15 elements to be sorted:");
            
            // Print a small section of the array; prove not sorted
            for (int i = 0; i < 15; i++){
                System.out.printf("%4d", sortingArray[i]);
            }
            
            // Perform sort.
            runNativeSort(sortingArray);
        }
    }

    // Run native java sort.
    private void runNativeSort(int[] array) {
        // Start timer
        long startTime = System.currentTimeMillis();
        
        // Perform sort.
        Arrays.sort(array);
        
        // End timer
        long finishTime = System.currentTimeMillis();
        
        // Running time.
        long runTime = finishTime - startTime;
        
        // Report run time.
        System.out.println("\nRun time: " +runTime);        
    }

    // Obtain an array filled with random int values.
    private int[] getArray() {
        // Make int array.
        int[] mArray = new int[ARRAY_SIZE];
        
        // Length of array.
        int length = mArray.length;
        
        // Fill array with random numbers.
        for(int counter = 0; counter < length; counter++){
            int random = MIN_RANGE + (int)(Math.random() * ((MAX_RANGE - MIN_RANGE) + INCLUSIVE));
            mArray[counter] = random;
        }
        return mArray;
    }
}
Community
  • 1
  • 1
ross studtman
  • 936
  • 1
  • 8
  • 22
  • If you want to seriously microbenchmark anything on the JVM, you should not try to bake your own tool, you'll never cover each and every issue involved. Use Google's Caliper or Oracle's jmh, which are both industrial-grade microbenchmark tools. Both are quite easy to set up. – Marko Topolnik Jul 21 '13 at 16:03

1 Answers1

9

Why is the sort faster the second time?

Because by that time, the JIT has optimized the bytecode into faster native code.

There are two effects you need to counter when benchmarking this sort of thing:

  1. Time taken to JIT the code in the first place
  2. The way that the native code improves over time as it is optimized harder and harder by the JIT.

Typically you can reduce the effect of this to achieve a steady state by running the code for long enough to get it fully optimized before you start timing.

Additionally, you should use System.nanoTime instead of System.currentTimeMillis when benchmarking: System.currentTimeMillis is meant to give you a reasonably accurate "wall clock" time, which may be adjusted by the operating system if it notices that the clock is out of sync, whereas nanoTime is specifically designed for measuring elapsed time since a particular instant, regardless of changes to the system clock.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    @RanEldan: Yup, I receive plenty of downvotes. Some deserved (and then hopefully undone when I correct the post) - some less so, IMO ;) – Jon Skeet Jul 21 '13 at 15:59
  • OP is also allocating a new array each time. This will skew the measurements due to GC cycles. – Marko Topolnik Jul 21 '13 at 16:00
  • @Jon thank you, I had not considered those effects. Hmm, maybe I can run the sorts like 3 times on different clones and then record the nanoTime on the fourth run? If that seems fair then the issue is the user having to wait for the JIT-optimization runs and the timed run, which doesn't seem like an issue with java's native sort but one of the sorts being used in my program is Insertion sort (I don't use 500k array sizes!) and it may take 2 seconds to perform...but now that I think about it, that sort algorithm is NOT improved by the JIT so perhaps not an issue. Now to implement this :) – ross studtman Jul 21 '13 at 16:04
  • 1
    @rossstudtman: You run it multiple times in order to improve the stability of benchmarks; you wouldn't run it multiple times for a user-facing application... you just need to expect that the first time through when you execute the code, it won't be as fast as later. And I would expect *all* kinds of code to be improved by the JIT... – Jon Skeet Jul 21 '13 at 16:22
  • @ross studtman: Instead of cloning, keep one original and one working array, and use `System.arrayCopy`. Getting rid of the allocation makes you measure more of what you want to measure and eliminates the GC related problems. – maaartinus Sep 12 '13 at 06:51