58

We are going to write a concurrent program using Clojure, which is going to extract keywords from a huge amount of incoming mail which will be cross-checked with a database.

One of my teammates has suggested to use Erlang to write this program.

Here I want to note something that I am new to functional programming so I am in a little doubt whether clojure is a good choice for writing this program, or Erlang is more suitable.

Peer Stritzinger
  • 8,232
  • 2
  • 30
  • 43
Quazi Farhan
  • 1,425
  • 7
  • 18
  • 29
  • Wasn't string handling of Erlang suboptimal? (To avoid the concurrency discussions...) – kotarak Jun 05 '11 at 21:00
  • 7
    String handling in Erlang is only sub optimal if you misuse it, in my opinion. Working with binaries is _very_ efficient, for example. – Adam Lindberg Jun 06 '11 at 10:08
  • Erlang has weird syntax: http://damienkatz.net/2008/03/what_sucks_abou.html – Hamish Grubijan Jun 07 '11 at 20:07
  • 11
    It's surprising that Erlang's weird syntax would be an argument coming from someone who's a fan of Lisps (which I assume given this is Erlang vs. Clojure). Both have unfamiliar syntaxes that can hold their own while being entirely despised by a lot of people. Elegance is in the eye of the beholder and I would never be surprised to find dozens of programmers who think both languages look terrible (I turn out to like both lisps and Erlang). – I GIVE TERRIBLE ADVICE Jun 17 '11 at 12:00

5 Answers5

65

Do you really mean concurrent or distributed?

If you mean concurrent (multi-threaded, multi-core etc.), then I'd say Clojure is the natural solution.

  • Clojure's STM model is perfectly designed for multi-core concurrency since it is very efficient at storing and managing shared state between threads. If you want to understand more, well worth looking at this excellent video.
  • Clojure STM allows safe mutation of data by concurrent threads. Erlang sidesteps this problem by making everything immutable, which is fine in itself but doesn't help when you genuinely need shared mutable state. If you want shared mutable state in Erlang, you have to implement it with a set of message interactions which is neither efficient nor convenient (that's the price of a nothing shared model....)
  • You will get inherently better performance with Clojure if you are in a concurrent setting in a large machine, since Clojure doesn't rely on message passing and hence communication between threads can be much more efficient.

If you mean distributed (i.e. many different machines sharing work over a network which are effectively running as isolated processes) then I'd say Erlang is the more natural solution:

  • Erlang's immutable, nothing-shared, message passing style forces you to write code in a way that can be distributed. So idiomatic Erlang automatically can be distributed across multiple machines and run in a distributed, fault-tolerant setting.
  • Erlang is therefore very well optimised for this use case, so would be the natural choice and would certainly be the quickest to get working.
  • Clojure could do it as well, but you will need to do much more work yourself (i.e. you'd either need to implement or choose some form of distributed computing framework) - Clojure does not currently come with such a framework by default.

In the long term, I hope that Clojure develops a distributed computing framework that matches Erlang - then you can have the best of both worlds!

mikera
  • 105,238
  • 25
  • 256
  • 415
  • 9
    Concurrency and parallelism aren't the same thing. Erlang does support very natural concurrency, and the approach to parallelism also works. STM, Message passing, promises & futures are all valid options to get both concurrency and parallelism. Which one you need is left to be decided by the nature of the problem you want to solve. – I GIVE TERRIBLE ADVICE Jun 17 '11 at 11:57
  • 2
    In erlang, an ETS table can be used for shared mutable state. – jtmoulia Jun 13 '13 at 22:28
  • 3
    Don't forget Erlang's per-process GC though — even if Clojure "develops a distributed computing framework", JVM's GC won't match Erlang's GC. – Erik Kaplun Mar 27 '16 at 17:08
51

The two languages and runtimes take different approaches to concurrency:

  • Erlang structures programs as many lightweight processes communicating between one another. In this case, you will probably have a master process sending jobs and data to many workers and more processes to handle the resulting data.

  • Clojure favors a design where several threads share data and state using common data structures. It sounds particularly suitable for cases where many threads access the same data (read-only) and share little mutable state.

You need to analyze your application to determine which model suits you best. This may also depend on the external tools you use -- for example, the ability of the database to handle concurrent requests.

Another practical consideration is that clojure runs on the JVM where many open source libraries are available.

nimrodm
  • 23,081
  • 7
  • 58
  • 59
  • 6
    good answer. though note that the Clojure concurrent STM model is equally applicable even if the shared state is mutable. Co-ordinating changes to shared mutable state is, in fact, the main motivation for Clojure's STM approach. – mikera Jun 06 '11 at 17:58
  • 1
    In my limited experience "many threads access the same data (read-only) and share little mutable state" has covered 99% of problems i have tried to solve ever. – FUD Nov 18 '14 at 07:04
10

Clojure is Lisp running on the Java JVM. Erlang is designed from the ground up to be highly fault tolerant and concurrent.

I believe the task is doable with either of these languages and many others as well. Your experience will depend on how well you understand the problem and how well you know the language. If you are new to both, I'd say the problem will be challenging no matter which one you choose.

Have you thought about something like Lucene/Solr? It's great software for indexing and searching documents. I don't know what "cross checking" means for your context, but this might be a good solution to consider.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • 2
    I have heard a lot about top-notch concurrency model about Clojure but Erlang has a much more solid reputation. That is why I was doubtful. Although I am a bit biased to Clojure, I do not want to start with it and later face pitfalls. As for cross-checking I meant that those keywords from mails will be searched for in the dictionary to sort the mails. – Quazi Farhan Jun 05 '11 at 16:32
  • btw, Lucence/Solr also looks interesting for this purpose. Thank you. – Quazi Farhan Jun 05 '11 at 16:41
  • 1
    you can even use Lucence/Solr with clojure. Its best of all world. – Quazi Irfan Jun 05 '11 at 16:53
0

My approach would be to write a simple test in each language and test the performance of each one. Both languages are somewhat different to C style languages and if you aren't used to them (and you don't have a team that is used to them) you may end up with a maintenance nightmare.

I'd also look at using something like Groovy 1.8. Groovy now includes GPars to enable parallel computing. String and file manipulation in Groovy is very easy indeed.

Fortyrunner
  • 12,702
  • 4
  • 31
  • 54
-4
  1. It depends what you mean by huge.
  2. Strings in erlang are painful..

but:

If huge means tens of distributed machines, than go with erlang and write workers in text friendly languages (python?, perl?). You will have distributed layer on the top with highly concurrent local workers. Each worker would be represented by erlang process. If you need more performance, rewrite your worker into C. In Erlang it is super easy to talk to another languages.

If huge still means one strong machine go with JVM. It is not huge then.

If huge is hundreds of machines, I think you will need something stronger google-like (bigtable, map/reduce) probably on C++ stack. Erlang still OK, however you will need good devs to code it.

user425720
  • 3,578
  • 1
  • 21
  • 23
  • 3
    strings are painful in Erlang? i do not entirely agree on this one. probably because you have to do stuff ground up each time or because it lacks cutting edge python string manipulation. I have done as much string processing i have needed in Erlang as i have in Python 2.x – Muzaaya Joshua Jun 06 '11 at 10:28
  • I do not mind writing stuff, but handling encoding is especially broken. Also string is a list, it is allocated on heap.. it does not use memory efficiently and is complex to analyze. – user425720 Jun 06 '11 at 16:36