Why is Erlang slower than Java on all these small math benchmarks?

Question

While considering alternatives for Java for a distributed/concurrent/failover/scalable backend environment I discovered Erlang. I've spent some time on books and articles where nearly all of them (even Java addicted guys) says that Erlang is a better choice in such environments, as many useful things are out of the box in a less error prone way.

I was sure that Erlang is faster in most cases mainly because of a different garbage collection strategy (per process), absence of shared state (b/w threads and processes) and more compact data types. But I was very surprised when I found comparisons of Erlang vs Java math samples where Erlang is slower by several orders, e.g. from x10 to x100.

Even on concurrent tasks, both on several cores and a single one.

What's the reasons for that? These answers came to mind:

Usage of Java primitives (=> no heap/gc) on most of the tasks
Same number of threads in Java code and Erlang processes so the actor model has no advantage here
Or just that Java is statically typed, while Erlang is not
Something else?

If that's because these are very specific math algorithms, can anybody show more real/practice performance tests?

UPDATE: I've got the answers so far summarizing that Erlang is not the right tool for such specific "fast Java case", but the thing that is unclear to me - what's the main reason for such Erlang inefficiency here: dynamic typing, GC or poor native compiling?

Java compiles math fairly efficiently to native machine code and is often almost as fast as C++ ;) I suspect erlang does not. You can try the command line option `-nojit` to see if this slows down Java to about the same. — Peter Lawrey, Nov 29 '12 at 16:09
_Java compiles math fairly efficiently to native machine code_ I'm sure you know there is no special byte code for math )) efficiency here is expense of primitives/no objects etc, if I implement the same math with Objects/Wrappers - results will be very worse — yetanothercoder, Nov 29 '12 at 19:45
btw why didn't erlang HiPE native compiling happen or why is it 10 times slower? — yetanothercoder, Nov 29 '12 at 20:14
Plain Erlang compared to HiPE http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=erlang&lang2=hipe — igouy, Nov 30 '12 at 00:25
compare math on numbers across 10,000 threads and see which is faster? then do it distributed, come back and tell us why you think this answer is constructive then ( hint benchmark questions rarely are constructive ) — , Nov 30 '12 at 05:28
@JarrodRoberson true, but if I code such dirstributed math with java threads num = core num (everything via executors) +no synchronization +sending only primitives via network etc. - I doubt that java will be slower, it may be more verbose, errorprone, less scalable etc, but still faster, don't you think? — yetanothercoder, Nov 30 '12 at 05:48
In both cases; I think you would still be writing and debugging code, days maybe weeks after the erlang programmer had finished their code, and moved on to another contract :-) My point that you missed is, remove the contrived "small benchmarks" and make them real world cases and it will show how non-constructive this benchmark is. — , Nov 30 '12 at 13:15
@Jarrod Roberson -- Seems to me that `yetanothercoderu` is looking for a technical explanation of what exactly JVM and HiPE do differently in these cases that gives rise to the observed performance difference. afaict our *answers* have not been constructive because we simply don't know. — igouy, Nov 30 '12 at 16:53
@igouy That's how I read it too. "It's not built for it" isn't really an answer, is it. — biziclop, Nov 30 '12 at 18:24
why doesn't my Ferrari haul pizza's as efficiently as my Hyundai? I mean really why does my Ferrari burn so much more gas and cost so much more to up keep just to deliver a few pizzas each night? And don't tell me *"it wasn't designed to do that"*! — , Nov 30 '12 at 22:32

score 34 · Accepted Answer · edited Dec 08 '16 at 11:25

34

Erlang was not built for math. It was built with communication, parallel processing and scalability in mind, so testing it for math tasks is a bit like testing if your jackhammer gives you refreshing massage experience.

That said, let's offtop a little:
If you want Erlang-style programming in JVM, take a look at Scala Actors or Akka framework or Vert.x.

edited Dec 08 '16 at 11:25

Community

1
1

answered Nov 29 '12 at 15:53

npe

15,395
1
56
55

1

OK, good point, but what do you think the main reason for poor math here: dynamic typing? scala, akka as on the top of jvm - has the same "jvm architecture issues": global GC, which is a very serious issue for big heaps, and no valid hot redeploy option, only restart if you want to update the PROD with minimal "strange" issues – yetanothercoder Nov 29 '12 at 19:51
4

I do not know Erlang enough to tell why. And of course JVM has its issues. What I meant was: _"use the tool proper to your problem"_. Erlang is great for sending messages. Not necessarily so great for processing them. If you do math calculations, use Matlab, or C, or Assembler. If you do statistics, use R, and so on, and so on. – npe Nov 30 '12 at 14:09

score 15 · Answer 2 · answered Nov 29 '12 at 15:43

15

Benchmarks are never good for saying anything else than what they are really testing. If you feel that a benchmark is only testing primitives and a classic threading model, that is what you get knowledge about. You can now with some confidence say that Java is faster than Erlang on mathematics on primitives as well as the classic threading model for those types of problems. You don't know anything about the performance with large number of threads or for more involved problems because the benchmark didn't test that.

If you are doing the types of math that the benchmark tested, go with Java because it is obviously the right tool for that job. If you want to do something heavily scalable with little to no shared state, find a benchmark for that or at least re-evaluate Erlang.

If you really need to do heavy math in Erlang, consider using HiPE (consider it anyway for that matter).

answered Nov 29 '12 at 15:43

Emil Vikström

90,431
16
141
175

4

btw I see why java is faster here - almost all is compile to native +no heap, +static. What about erlang here: as it's HiPE isn't it compiled to native? what about heap? Or only static vs dynamic plays crucial role here? – yetanothercoder Nov 29 '12 at 20:05
Here is a great example of why not to trust benchmarks from one of the performance guys at Netflix. http://dtrace.org/blogs/brendan/2014/02/11/another-10-performance-wins/ – SudoKid Dec 29 '16 at 23:16
@EmettSpeer Interesting post but I fail to see the connection to benchmarking. – Emil Vikström Dec 30 '16 at 10:24

stemm · Answer 3 · 2016-12-09T08:52:59.410

8

As pointed in other answers - Erlang is designed to solve effectively real life problems, which are bit opposite to benchmark problems.

But I'd like to enlighten one more aspect - pithiness of erlang code (in some cases means rapidness of development), which could be easily concluded, after comparing benchmarks implementations.

For example, k-nucleotide benchmark:
Erlang version: http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotide&lang=hipe&id=3
Java version: http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotide&lang=java&id=3

If you want more real-life benchmarks, I'd suggest you Comparing C++ And Erlang For Motorola Telecoms Software

edited Dec 09 '16 at 08:52

answered Nov 29 '12 at 16:20

stemm

5,960
2
34
64

1

You made a false comparison -- that Java program isn't included, it's listed under "wrong" (different) algorithm / less comparable programs". The Java program from the comparison shows 13 secs and source code 1630, versus Erlang program 157 secs and source code 932. http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotide&lang=java&id=3 – igouy Jul 07 '13 at 17:17

Vans S · Answer 4 · 2016-02-15T22:11:27.757

I took interest to this as some of the benchmarks are a perfect fit for erlang, such as gene sequencing. So on http://benchmarksgame.alioth.debian.org/ the first thing I did was look at reverse-complement implementations, for both C and Erlang, as well as the testing details. I found that the test is biased because it does not discount the time it takes erlang to start the VM /w the schedulers, natively compiled C is started much faster. The way those benchmarks measure is basically: time erl -noshell -s revcomp5 main < revcomp-input.txt

Now the benchmark says Java took 1.4 seconds and erlang /w HiPE took 11. Running the (Single threaded) Erlang code took me 0.15 seconds, and if you discount the time it took to start the vm, the actual workload took only 3000 microseconds (0.003 seconds).

So I have no idea how that is benchmarked, if its done 100 times then it makes no sense as the cost of starting the erlang VM will be x100. If the input is a lot longer than given, it would make sense, but I see no details on the webpage of that. To make the benchmarks more fair for Managed languages, have the code (Erlang/Java) send a Unix signal to the python (that is doing the benchmarking) that it hit the startup function.

Now benchmark aside, the erlang VM essentially just executes machine code at the end, as well as the Java VM. So there is no way a math operation would take longer in Erlang than in Java.

What Erlang is bad at is data that needs to mutate often. Such as a chained block cypher. Say you have the chars "0123456789", now your encryption xors the first 2 chars by 7, then xors the next two chars by the result of the first two added, then xors the previous 2 chars by the results of the current 2 subtracted, then xors the next 4 chars.. etc

Because objects in Erlang are immutable this means that the entire char array needs to be copied each time you mutate it. That is why erlang has support for things called NIFS which is C code you can call into to solve this exact problem. In fact all the encryption (ssl,aes,blowfish..) and compression (zlib,..) that ship with Erlang are implemented in C, also there is near 0 cost associated with calling C from Erlang.

So using Erlang you get the best of both worlds, you get the speed of C with the parallelism of Erlang.

If I were to implement the reverse-complement in the FASTEST way possible, I would write the mutating code using C but the parallel code using Erlang. Assuming infinite input, I would have Erlang split on the > <<Line/binary, ">", Rest/binary>> = read_stream Dispatch the block to the first available scheduler via round robin, consisting of infinite EC2 private networked hidden nodes, being added in real time to the cluster every millisecond.

Those nodes then call out to C via NIFS for processing (C was the fastest implementation for reverse-compliment on alioth website), then send the output back to the node master to send out to the inputer.

To implement all this in Erlang I would have to write code as if I was writing a single threaded program, it would take me under a day to create this code.

To implement this in Java, I would have to write the single threaded code, I would have to take the performance hit of calling from Managed to Unmanaged (as we will be using the C implementation for the grunt work obviously), then rewrite to support 64 cores. Then rewrite it to support multiple CPUS. Then rewrite it again to support clustering. Then rewrite it again to fix memory issues.

And that is Erlang in a nutshell.

score 3 · Answer 5 · answered Mar 10 '15 at 20:25

3

The Erlang solution uses ETS, Erlang Term Storage, which is like an in-memory database running in a separate process. Consequent to it being in a separate process, all messages to and from that process must be serialized/deserialized. This would account for a lot of the slowness, I should think. For example, if you look at the "regex-dna" benchmark, Erlang is only slightly slower than Java there, and it doesn't use ETS.

answered Mar 10 '15 at 20:25

pmarreck

179
1
9

And the reason why Erlang has to do this is because it is not statically typed. – Lothar May 13 '17 at 16:36
Not all terms in and out of ETS have to be serialized/deserialized. – Peter R Aug 17 '17 at 21:20

score 2 · Answer 6 · answered Dec 21 '12 at 22:41

The fact that erlang has to allocate memory for every value whereas in java you will typically reuse variables if you want it to be fast, means it will always be faster for 'tight loop' bench marks.

It would be interesting to benchmark a java version using the -client flag and boxed primitives and compare that to erlang.

I believe using hipe is unfair since it is not an active project. I would be interested to know if any mission critical software is running on this.

score -8 · Answer 7 · answered Nov 29 '12 at 16:13

-8

I don't know anything about Erlang, but this seems to be a compare apples to oranges approach anyways. You must be aware that considerable effort was spent over more than a decade to improve java preformance to the point where it is today.

Its not surprising (to me) that a language implementation done by volunteers or a small company can not outmatch that effort.

answered Nov 29 '12 at 16:13

Durandal

19,919
4
36
70

20

Ericsson, the creators and maintainers of Erlang, is one of the biggest providers of telecommunications equipment worldwide, and one of the largest companies in Sweden. Erlang is more than two decades old. None of your reasons are relevant to Java outperforming Erlang in this benchmark. – Emil Vikström Nov 29 '12 at 18:35
12

@EmilVikström Well, they sort of are. While Ericsson is a large company it is only a small group within Ericsson, about 20 people, who support, maintain and develop Erlang. What is probably more important is that Erlang was designed for a different type of application that Java. Specifically massively concurrent fault tolerant applications. There are real-live products who run millions of TCP connections on one machine, http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2011/ – rvirding Nov 29 '12 at 22:21
1

How can you comment on something if you don't know anything about it? – zenw0lf Jan 03 '17 at 18:06

Why is Erlang slower than Java on all these small math benchmarks?

7 Answers7