5

hello stackoverflow users, this is my first question asked, so if there are any errors in my way of expressing it, please point it out, thank you

I wrote this simple calculation function in both Java and C++

Java:

long start = System.nanoTime();
long total = 0;
for (int i = 0; i < 2147483647; i++) {
    total += i;
}
System.out.println(total);
System.out.println(System.nanoTime() - start);

C++:

auto start = chrono::high_resolution_clock::now();
register long long total = 0;
for (register int i = 0; i < 2147483647; i++)
{
    total += i;
}
cout << total << endl;
auto finish = chrono::high_resolution_clock::now();
cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count() << endl;

software: - JDK8u11 - Microsoft Visual C++ Compiler (2013)

results:

Java: 2305843005992468481 1096361110

C++: 2305843005992468481 6544374300

The calculation results are the same, which is good however, the nano time printed shows the Java program takes 1 second while in C++ it takes 6 seconds to execute

I've been doing Java for quite some time, but I am new to C++, is there any problem in my code? or is it a fact that C++ is slower than Java with simple calculations?

also, i used the "register" keyword in my C++ code, hoping it will bring performance improvements, but the execution time doesn't differ at all, could someone explain this?

EDIT: My mistake here is the C++ compiler settings are not optimized, and output is set to x32, after applying /O2 WIN64 and removing DEBUG, the program only took 0.7 seconds to execute

The JDK by default applies optimization to output, however this is not the case for VC++, which favors compilation speed by default, different C++ compilers also vary in result, some will calculate the loop's result in compile time, leading to extremely short execution times (around 5 microseconds)

NOTE: Given the right conditions, the C++ program will perform better than Java in this simple test, however I noticed many runtime safety checks are skipped, violating it's debug intention as a "safe language", I believe C++ will even more outperform Java in a large array test, as it does not have bound checking

shingekinolinus
  • 129
  • 1
  • 8
  • 5
    How did you compile? With which compiler and optimization flags? On which system? – Basile Starynkevitch Jul 20 '14 at 06:38
  • 1
    `register` is deprecated in C++11 and is routinely ignored by compilers anyway. – T.C. Jul 20 '14 at 06:40
  • 1
    Possibly the I/O - take the finish time before printing the total. – cup Jul 20 '14 at 06:42
  • 1
    My tests show that g++ optimizes away the loop entirely at anything above `-O0`. – T.C. Jul 20 '14 at 06:45
  • 3
    *or is it a fact that C++ is slower than Java with simple calculations?* No, that much is sure. – deviantfan Jul 20 '14 at 06:45
  • What is `long long`? Is that the same type as the Java `long`? I think cup's comment about IO is a good one, definitely should remove that from the timing calculation. Also: micro-benchmarks are crap. – markspace Jul 20 '14 at 06:49
  • @user2338547: Yes, a 64bit number (63bit and sign). There could be differences because of unusual processors etc. (like 1-complement) , but on usual systems... and on such unusual one, there won´t be a java anyways – deviantfan Jul 20 '14 at 06:54
  • @BasileStarynkevitch thanks for your quick answer, I compiled my C++ program with /Od and /Oy – shingekinolinus Jul 20 '14 at 07:21
  • Almost every single C++ performance question question on stack overflow points out that unoptimized debug performance is meaningless. Did you read any C++ [tag:performance] posts on stack overflow before posting your question? – Yakk - Adam Nevraumont Jul 20 '14 at 10:40
  • Did you select the X64 (64 bit) option under the project's properties? – rcgldr Jul 20 '14 at 20:24
  • 1
    For such trivial code, I would expect both to produce identical machine code. Your tiny runtime difference in favor of C++ is very likely caused only by JVM needing to warm up. "I believe C++ will even more outperform Java in a large array test, as it does not have bound checking" - nope, because HotSpot eliminates bound checking more often than not and additionally has more accurate information on pointer aliasing which can sometimes lead to much faster code. You can be surprised both ways. E.g. see this: http://lemire.me/blog/archives/2012/07/23/is-cc-worth-it/ – Piotr Kołaczkowski Aug 08 '14 at 15:42
  • 1
    That's not a fair test of Java, since it doesn't "warm up" the JITC. A better way would be to make a separate test routine (not in `main`) and call it twice. It's not until the routine gets called a second time that most JITCs will fully compile the code. – Hot Licks Aug 16 '14 at 21:56

3 Answers3

8

On Linux/Debian/Sid/x86-64, using OpenJDK 7 with

// file test.java
class Test {
    public static void main(String[] args) {
    long start = System.nanoTime();
    long total = 0;
    for (int i = 0; i < 2147483647; i++) {
        total += i;
    }
    System.out.println(total);
    System.out.println(System.nanoTime() - start);
    }
}   

and GCC 4.9 with

   // file test.cc
#include <iostream>
#include <chrono>

int main (int argc, char**argv) {
 using namespace std;
 auto start = chrono::high_resolution_clock::now();
 long long total = 0;
 for (int i = 0; i < 2147483647; i++)
   {
     total += i;
   }
 cout << total << endl;
 auto finish = chrono::high_resolution_clock::now();
 cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count()
      << endl;
}    

Then compiling and running test.java with

javac test.java
java Test

I'm getting the output

2305843005992468481
774937152

when compiling test.cc with optimizations

g++ -O2 -std=c++11 test.cc -o test-gcc

and running ./test-gcc it goes much faster

2305843005992468481
40291

Of course without optimizations g++ -std=c++11 test.cc -o test-gcc the run is slower

2305843005992468481
5208949116

By looking at the assembler code using g++ -O2 -fverbose-asm -S -std=c++11 test.cc I see that the compiler computed the result at compile time:

    .globl  main
    .type   main, @function
  main:
  .LFB1530:
    .cfi_startproc
    pushq   %rbx    #
    .cfi_def_cfa_offset 16
    .cfi_offset 3, -16
    call    _ZNSt6chrono3_V212system_clock3nowEv    #
    movabsq $2305843005992468481, %rsi  #,
    movl    $_ZSt4cout, %edi    #,
    movq    %rax, %rbx  #, start
    call    _ZNSo9_M_insertIxEERSoT_    #
    movq    %rax, %rdi  # D.35007,
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_  #
    call    _ZNSt6chrono3_V212system_clock3nowEv    #
    subq    %rbx, %rax  # start, D.35008
    movl    $_ZSt4cout, %edi    #,
    movq    %rax, %rsi  # D.35008, D.35008
    call    _ZNSo9_M_insertIlEERSoT_    #
    movq    %rax, %rdi  # D.35007,
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_  #
    xorl    %eax, %eax  #
    popq    %rbx    #
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
  .LFE1530:
            .size   main, .-main

So you just need to enable optimizations in your compiler (or switch to a better compiler, like GCC 4.9)

BTW on Java low level optimizations happen in the JIT of the JVM. I don't know JAVA well but I don't think I need to switch them on. I do know that on GCC you need to enable optimizations which of course are ahead of time (e.g. with -O2)

PS: I never used any Microsoft compiler in this 21st century, so I cannot help you on how to enable optimizations in it.

At last, I dont believe that such microbenchmarks are significant. Benchmark then optimize your real applications.

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Right, in Java it´s not necessary to switch optimization on. Quite the opposite, it´s pretty much impossible (or very complicated) to turn it *off*. @OP: In a comparsion between unoptimized C++ and unoptimized Java, C++ will be faster anyways, because the JVM is written in...? Right. Without optimization, the JVM overhead is just more work for the processor => slower. – deviantfan Jul 20 '14 at 07:11
  • @deviantfan: the language in which the JVM is coded is not relevant, as soon as it use JIT techniques to generate machine code on the fly. The performance is related to the quality of the JIT-emitted machine code, and that machine code emission done by the JVM happens at run time. – Basile Starynkevitch Jul 20 '14 at 07:12
  • "Without" any optimization techniques...? In my opinion, this includes machine code quality-rising JIT things too. – deviantfan Jul 20 '14 at 07:13
  • By definition JVM & JIT are doing some optimizations. – Basile Starynkevitch Jul 20 '14 at 07:17
  • I see what you mean here, now I have /O2 /GL flags added, the same code took 1.7 seconds to execute. much faster, but why is it still slower than Java? – shingekinolinus Jul 20 '14 at 07:32
  • ill also try the GCC compiler, maybe that will help – shingekinolinus Jul 20 '14 at 07:34
  • If using GCC, be sure to install the latest (today july 21th 2014 it is 4.9.1) version – Basile Starynkevitch Jul 20 '14 at 07:38
  • by switching to WIN64, execution only took 0.7 seconds – shingekinolinus Jul 20 '14 at 20:59
0

Takes about .6 seconds (.592801000 seconds) on my system, Intel 2600K, 3.40ghz, with MSVC Express 2013, 64 bit mode, standard release build. Moved the cout to after setting finish to not include the overhead of cout.

#include <iostream>
#include <chrono>

using namespace std;

int main()
{
    auto start = chrono::high_resolution_clock::now();
    register long long total = 0;
    for (register int i = 0; i < 2147483647; i++)
    {
        total += i;
    }
    auto finish = chrono::high_resolution_clock::now();
    cout << total << endl;
    cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count() << endl;
    return 0;
}
rcgldr
  • 27,407
  • 3
  • 36
  • 61
-2

I think the easiest way to describe why C/C++ will ALWAYS be faster than Java is to understand how Java works.

From the beginning, Java has been developed to facilitate cross-platform software. Before Java, one had to compile their program on each machine family separately. Even now, with the variety of hardware architectures, accepted standards, and OS's out there, one cannot get around this hurdle. Java accomplishes this through its Compiler and JVM. The compiler applies whatever optimizations it can and assembles this into a Java bytecode, which is like a shorthand for the optimized source that was compiled. However, this bytecode cannot be understood by the processor yet.

This is where the Java Virtual Machine comes in. First the JVM figures out what environment it is being run in and loads the appropriate translation table. Then the bytecode is read into the JVM and each code is looked up on the table and translated into the environment's native machine code, and then executed.

As you know, this all takes a tiny bit of time per instruction. But with a compiled C/C++ program, it is already in the proper machine code and is executed immediately.

Interesting note- All OS's and most device drivers are written in C for performance reasons.

David Bowman
  • 132
  • 1
  • 5
  • "All OS's and most device drivers are written in C for performance reasons." And that is not true. – Hot Licks Aug 17 '14 at 01:07
  • Really? Then please tell me how Java works then. And please tell me which operating systems or drivers were not written in C? And also don't disrespect people on these boards. – David Bowman Aug 18 '14 at 02:10
  • 1
    A number of operating systems for IBM boxes were written in PL/S. And I personally wrote much of the JVM (including the interpreter, the verifier, and the "static translator") for the IBM iSeries version of Java, so I have somewhat better idea of how Java works than you do. – Hot Licks Aug 18 '14 at 03:03