0

Why in simple for loop the same code in Java works 4 times faster than in C++? i.e. in Java this code completes in 700-800 ms and in C++ 4-5 SECONDS. Although C++ usually considered much faster than Java, especially with CPU-bound workloads. Have i lost sight of some important moment ???

Java:

import java.time.Duration;
import java.time.Instant;

public class Main {

    public static void main(String[] args) {

        long x = 0;

        Instant start = Instant.now();

        for (long i = 0; i < 2147483647; i++)
            x += i;

        Instant end = Instant.now();

        Duration result = Duration.between(start, end);
        System.out.println("TIME: " + result.toMillis());
        System.out.println("X = " + x);
    }
}

Output:

TIME: 799
X = 2305843005992468481

C++:

#include <iostream>
#include <ctime>

int main() 
{ 
    long long x = 0;

    clock_t begin = clock();

    for (long long i = 0; i < 2147483647; i++)
        x += i;

    clock_t end = clock();

    double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
    std::cout << "Time elapsed: " << elapsed_secs << std::endl;
    std::cout << "x = " << x << std::endl;
    
    return 0; 
}

Output:

Time elapsed: 4.59629
x = 2305843005992468481
Ivan
  • 55
  • 8
  • 2
    Hoe did you compile the C++ code? – selalerer Oct 14 '20 at 07:57
  • What is the performance if you change to i++ in both? – DHa Oct 14 '20 at 07:59
  • The Java benchmark is flawed too ... because it doesn't take account of JVM benchmark. – Stephen C Oct 14 '20 at 07:59
  • 7
    Please re-read the G++ manual about the `-O` flag. Don't bother benchmarking anything if it is not at least compiled with `-O2`. – Botje Oct 14 '20 at 08:00
  • Try adding -o4 df – selalerer Oct 14 '20 at 08:00
  • @DHa it remains the same – Ivan Oct 14 '20 at 08:03
  • @Botje yeah, with -O2 gives `Time elapsed: 2e-06`. Thanx)) – Ivan Oct 14 '20 at 08:05
  • 3
    the optimiser turns your code into print `2305843005992468481`: https://godbolt.org/z/ohzjY3 – Alan Birtles Oct 14 '20 at 08:07
  • 1
    Discussing performance related to Java needs information about the JVM (what version). Discussing performance related to C++ needs information about the compiler and library (version, compilation option, etc). Discussion of both requires information about the target/host operating system and hardware. You've provided no such information, so your comparison is meaningless. There is also rarely any point in discussing performance of C++ in non-release (debugging) builds, which is often the default. – Peter Oct 14 '20 at 08:10
  • In order to force the `C++` code to do some work you could generate a set of random numbers before the test and then add those together. Random numbers are opaque to the optimizer and thus will force it to generate the loop. Be sure to print out the result afterwards to make sure the optimizer doesn't just decide to not bother calculating a result that is never used. – Galik Oct 14 '20 at 08:13
  • btw "Although C++ usually considered much faster than Java" is mainly folklore and myths. Trying to measure the difference is the right way. Unfortunately micro measurements are not trivial to get right. I suggest you to use a benchmarking library for that. The better way is anyhow to measure the real application instead of toy examples – 463035818_is_not_an_ai Oct 14 '20 at 08:17

1 Answers1

5

C++ can be fast when you turn on optimizations. The default is compiling without optimizations because compiling C++ takes time. You need to use the -O flag, -O2 should be ok.

The loop you measure is a common pattern and has a direct solution:

int sum = 0;
for (int i=0;i < n; ++i) sum += i
// this will give the same result:
sum = ((n+1)*n)/2;

And compilers know about this trick (they probably don't use that formula, because it can overflow for (n+1)*n while the final result is no overflow). With gcc, this:

#include <iostream>

int main(){
    long long x = 0;
    for (long long i = 0; i < 2147483647; ++i)
        x += i;
    std::cout << x;
}

translates to:

main:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:_ZSt4cout
        movabs  rsi, 2305843005992468481
        call    std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<long long>(long long)
        xor     eax, eax
        add     rsp, 8
        ret
_GLOBAL__sub_I_main:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:_ZStL8__ioinit
        call    std::ios_base::Init::Init() [complete object constructor]
        mov     edx, OFFSET FLAT:__dso_handle
        mov     esi, OFFSET FLAT:_ZStL8__ioinit
        mov     edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
        add     rsp, 8
        jmp     __cxa_atexit

Note that there is no loop! Unlikely this takes seconds to execute.

See here what happens when you turn off optimiziations (the default): https://godbolt.org/z/8KncEj

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185