2

I recently started playing with Elixir and some patterns remind me of Python, which is widely used in data science projects. For example list comprehensions or anonymous functions.

Considering the high performance of Elixir and the ability to run multiple processes and deal with asynchronous tasks it seams to me to be a very good fit for Data Science projects.

Am I missing a point? Does somebody have experience with this?

Ole Spaarmann
  • 15,845
  • 27
  • 98
  • 160
  • 1
    You are confusing elixir's processes with real multiprocessing. Elixir's processes are an abstraction used for concurrency and not actual operating system processes. Concurrency != parallelism. Elixir's place is more in orchestrating input/output (this is of course a simplification). Also consider Python's toolset for data science, numpy, pandas, etc, all have some sort of optimizations in C. There's a good talk about about concurrency and parallism by Rob Pike https://www.youtube.com/watch?v=cN_DpYBzKso – Ale Mar 01 '16 at 14:34
  • 1
    I just want to add this to the conversation: [Scientific Computing on the Erlang VM](http://blog.lfe.io/tutorials/2015/01/01/1215-scientific-computing-on-the-erlang-vm/), a ports wrapper of NumPy and SciPy (among others) for the Erlang ecosystem. – Ole Spaarmann Mar 01 '16 at 14:43
  • Right, but that's using Ports, so it's communicating to an external program in Python. One could ask a bunch of questions regarding the particular problem you want to solve. Do you want to implement your own algorithms? If yes, I probably wouldn't use Elixir or Erlang. Do you want to "add" data science to some Elixir or Erlang project. Then this is great! – Ale Mar 01 '16 at 14:49
  • At the moment I'm building an app to collect data. Basically just the bucket. The analytics part will come later. So I'm very open about solutions and as it looks like the data science part will be external and not a part of my elixir app, since this is a different problem. – Ole Spaarmann Mar 01 '16 at 15:05
  • I'm sorry--you're asking a pretty broad question. Elixir is a general purpose language just like Python. Try to narrow your question. Vote to close. – Onorio Catenacci Mar 01 '16 at 15:20
  • To give you a flavour - I have looked at sparse matrix vector multiplication in Elixir - it is not an ideal match. For me the stumbling point was passing data between processes in a rapid enough way. Elixir doesn't yet have support for remote direct memory access or infiniband. Which means it just can't compete against Fortran/MPI on a cluster. – GavinBrelstaff Mar 01 '16 at 17:24

2 Answers2

8

I'm an advocate for using the right tool for the job. There are typically two requirements to do data science:

  • Libraries (because you don't want to reinvent the wheel at every corner)
  • Performance (particularly if dealing with large amounts of data)

Python and R are the right tools. They offer the largest number of high-quality libraries, and though slow on their own, they perform well thanks to libraries written and optimized in fast languages like C and Fortran.

Some like alternatives like Julia and Scala. These are faster languages on their own and have a decent amount of libraries, though you might still run into some situations where suitable libraries are available in Python or R, but not Julia or Scala.

With languages like Elixir, you're are for the most part on your own. The amount of data science specific libraries is limited, and the Elixir community - though wonderful - is mostly focused on distributed computing and web development, so don't count on lots of support there.

In short, can you? Technically yes, and there is no harm in experimenting, but you're making your life significantly harder.

Keep also in mind that, contrary to popular belief, Elixir is not a fast language when it comes to single-thread performance. Depending on the task at hand, you'll find that Ruby is just as fast or even faster in some instances.

Don't get me wrong, Elixir is a great language and it's amazing at what it does best, it's just that it's not the kind of language I'd reach out to first for mathematical computations.

Antonio Cangiano
  • 776
  • 6
  • 10
  • Great answer. I just moved into Elixir from Ruby, and have realised I'll probably have to take a short jump to Python if I want to get my hands dirty with data science. I considered R, but figured I might as well lean on what I already know. – Damien Roche Mar 09 '18 at 20:47
2

Data Science is very broad topic there many things involved, I would like to add my 2 cents you sure can do data science in elixir but it may not do certain things very well like some of the other tools do, but you can get pretty far I use elixir for data cleaning and data formatting.

There are other folks that are doing data related stuff with elixir/erlang https://moz.com/devblog/moz-analytics-db-free/ and there is disco that allows you run MapReduce jobs in erlang https://github.com/discoproject/disco

allyraza
  • 1,376
  • 11
  • 7