Reducing the time of execution of the following code

Question

I am working on a problem. Out of 17 test cases 10 works fine and gives the result in less than a second but in 7 cases it is taking 2 seconds which are beyond the time limit. Following is the code

import java.util.*;
import java.io.*;

class TestClass 
{
    static PrintWriter wr = new PrintWriter(System.out);

    public static void func1(int arr[], int n)
    {
        int temp = arr[0];
        for (int jj = 0; jj < n; jj++)
        {
            if (jj == (n - 1))
                arr[jj] = temp;
            else
                arr[jj] = arr[jj + 1];
        }
    }

    public static void func2(int arr[], int n, int rt) 
    {
        int count = 0;
        for (int a = 0; a < n; a++)
        {
            for (int b = a; b < n; b++)
            {
                if (arr[a] > arr[b])
                    count++;
            }
        }

        if (rt == (n - 1))
            wr.print(count);
        else
            wr.print(count + " ");
    }

    public static void main(String args[]) throws Exception 
    {
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        String str = br.readLine().trim();
        StringTokenizer st = new StringTokenizer(str);

        int t = Integer.parseInt(st.nextToken());

        for (int i = 0; i < t; i++)     //for test cases
        {
            str = br.readLine().trim();
            st = new StringTokenizer(str);
            int n = Integer.parseInt(st.nextToken());
            int arr[] = new int[n];
            str = br.readLine().trim();
            st = new StringTokenizer(str);

            for (int j = 0; j < n; j++)     //to take input of array for each test case
            {
                arr[j] = Integer.parseInt(st.nextToken());
            }

            for (int rt = 0; rt < n; rt++)    //for number of times circular shifting of array is done
            {
                func1(arr, n);    //circularly shifts the array by one position
                func2(arr, n, rt);  //prints the number of inversion counts
            }

            if (i != (t - 1))
                wr.println();
        }

        wr.close();
        br.close();
    }
}

Can someone suggest how to optimize the code so that it takes less time in execution. I know BufferReader and PrintWriter takes less time as compared to Scanner and System.out.print. I was using scanner and System.out.print earlier but changed it later in hope of getting less time but it didn't help. Also I did it earlier without the use of func1 and func2 and did all the operations in main only. The time in both the cases remains the same. I am getting the currect output in all the cases so code is correct, I just need help in optimizing it.

Now would be a good time to learn to use a *profiler*. There is probably one in your IDE. — slim, Dec 19 '17 at 15:39
Never omit braces `{}` around loop bodies. Lost the habit ASAP -- it'll save you a ton of confusion in future. — slim, Dec 19 '17 at 15:41
There is a n^3 loop in main and func2. I would start by looking at how to remove that. If n is large, that would be very slow. — clinomaniac, Dec 19 '17 at 15:47
Consider using other data-structures like LinkedList instead of an array (should make a huge difference in func1). Profile your code as suggested earlier. And please o please !!! Use meaningful names in your variables and functions. It goes to make a big difference reading the lines you have written. — Veera, Dec 19 '17 at 15:58
n^3 loops got my attention too, but I can't find any way of reducing them as all of them are performing some task and I can't skip any of them. I am a bit confused about how to reduce the complexity of it. — Samarth Tholia, Dec 19 '17 at 15:58
It might be useful to say what is the problem you tying to solve? what is expected result? might be a better way of solving it. — justMe, Dec 19 '17 at 16:01

Edwin Buck · Answer 1 · 2017-12-19T17:26:07.743

The website you are using acquires questions from past programming competitions. I recognize this as a familiar problem

Like most optimization questions, the preferred steps are:

Do less.
Do the same in fewer instructions.
Don't use functions.
Use faster instructions.

In your case, you have an array, and you wish to rotate it a number of times, and then to process it from the rotated position.

Rotating an array is an incredibly expensive operation, because you typically need to copy every element in the array into a new location. What is worse for you is that you are doing it the simplest way, you are rotating the array one step for every step needing rotation.

So, if you have a 100 element array that needs rotated 45 steps, you would then have (3 copies per element swap) 100 * 45 * 3 copies to perform your rotation.

In the above example, a better approach would be to figure out a routine that rotates an array 45 elements at a time. There are a number of ways to do this. The easiest is to double the RAM requirements and just have two arrays

b[x] = a[(mod(x+45), a.length)]

An even faster "do less" would be to never rotate the array, but to perform the calculation in reverse. This is conceptually the function of the desired index in the rotated array to the actual index in the pre-rotated array. This avoids all copying, and the index numbers (by virtue of being heavily manipulated in the math processing unit) will already be stored in the CPU registers, which is the fastest RAM a computer has.

Note that once you have the starting index in the original array, you can then calculate the next index without going through the calculation again.

I might have read this problem a bit wrong; because, it is not written to highlight the problem being solved. However, the core principles above apply, and it will be up to you to apply them to the exact specifics of your programming challenge.

An example of a faster rotate that does less
    public static void func1(int arr[], int shift) {
        int offset = shift % arr.length;

        int [] rotated = new int[arr.length];
        // (arr.length - 1) is the last index, walk up till we need to copy from the head of arr
        for (int index = 0; index < (arr.length - 1) - offset; index++) {
            rotated[index] = arr[index+offset];
        }
        // copy the end of the array back into the beginning
        for ( int index = (arr.length - 1) - offset; index < arr.length; index++ ) {
            rotated[index] = (offset - ((arr.length - 1)  - index) - 1);
        }

        System.arraycopy(rotated, 0, arr, 0, arr.length);
    }

This copies the array into its rotated position in one pass, instead of doing a pass per index to be rotated.

Except that most CPUs can copy a chunk of RAM in a single instruction, and `System.arrayCopy()` will take advantage of this. — slim, Dec 19 '17 at 16:14
@slim Unless the entire array can fit into a word, it cannot be copied in a single clock cycle. Even if it can be fit into a single word, odds are you'll have to shift it, then mask it, the store that in a buffer, then shift it again, mask it, and add the result into the same buffer. High level functions / keywords are not the same as CPU instructions! Some knowledge of how the CPU works under the language is required to really write fast code. — Edwin Buck, Dec 19 '17 at 16:17
Not a single clock cycle, but often a single CPU instruction, and if not that, a handful per page. https://stackoverflow.com/questions/17498743/how-does-the-internal-implementation-of-memcpy-work -- certainly orders of magnitude faster than moving array elements one-by-one. — slim, Dec 19 '17 at 16:36
@slim I've read the implementation of memcpy out of glibc before, and I've programmed in assembly, and **The post you linked to says the same thing I say in the first sentence**, "In general, you couldn't physically copy anything larger than the largest usable register in a single cycle" then makes some sort of hopeful guess that the CPU might support a faster way. If you can show me the Intel CPU instruction, I'll change my mind. Until then, keep in mind that the memory bus doesn't grow 1000 times its width to accommodate 1000 simultaneous memory moves in one clock. — Edwin Buck, Dec 19 '17 at 16:52

slim · Answer 2 · 2017-12-19T16:32:23.817

The first rule of optimisation (having decided it is necessary) is to use a profiler. This counts how many times methods are invoked, and measures the accumulated time within each method, and gives you a report.

It doesn't matter if a method is slow if you only run it a few times. If you run it hundreds of thousands of times, you need to either make it faster, or run it fewer times.

If you're using a mainstream IDE, you already have a profiler. Read its documentation and use it.

The other first rule of optimisation is, if there's already literature about the problem you're trying to solve, read it. Most of us might have invented bubble-sort independently. Fewer of us would have come up with QuickSort, but it's a better solution.

It looks as if you're counting inversions in the array. Your implementation is about as efficient as you can get, given that naive approach.

for(int i=0; i< array.length; i++) {
   int n1 = array[i];
   for(int j=i+1; j< array.length; j++) {
       n2 = array[j];
       if(n1 > n2) {
           count++;
       }
   }
}

For an array of length l this will take ( l - 1) + ( l - 2 ) ... 1 -- that's a triangular number, and grows proportionally to the square of l.

So for l=1000 you're doing ~500,000 comparisons. Then since you're repeating the count for all 1000 rotations of the array, that would be 500,000,000 comparisons, which is definitely the sort of number where things start taking a noticeable amount of time.

Googling for inversion count reveals a more sophisticated approach, which is to perform a merge sort, counting inversions as they are encountered.

Otherwise, we need to look for opportunities for huge numbers of loop iterations. A loop inside a loop makes for big numbers. A loop inside a loop inside another loop makes for even bigger numbers.

You have:

for (int i = 0; i < t; i++) {
    // stuff removed
    for (int rt = 0; rt < n; rt++) {
        // snip
        func2(arr, n, rt);  //prints the number of inversion counts
    }
    // snip
}

public static void func2(int arr[], int n, int rt) {
    // snip
    for (int a = 0; a < n; a++) {
        for (int b = a; b < n; b++) {
            // stuff
        }
    }
    // snip
}

That's four levels of looping. Look at the input values for your slow tests, and work out what n * n * n * t is -- that an indicator of how many times it'll do the work in the inner block.

We don't know what your algorithm is supposed to achieve. But think about whether you're doing the same thing twice in any of these loops.

It looks as if func1() is supposed to rotate an array. Have a look at System.arrayCopy() for moving whole chunks of array at a time. Most CPUs will do this in a single operation.

The first rule of optimization is: don't do it. The 2nd is: don't do it yet (for experts only). The 3rd is: use a profiler. — Robert, Dec 19 '17 at 16:08

Reducing the time of execution of the following code

2 Answers2