8

I am researching the possibility of starting a data mining project which will include intensive calculations and transformation on data, and should be relatively easy to scale.

In your experience, is the choice of programming language critical for said project?

For example, if I am already working on a JVM environment, should I prefer Clojure over plain Java? Does the functional environment guarantee easier scalability? Better performance?

Put aside other factors such as familiarity with the language, toolchain, etc. In your experience, is the choice of language a critical one?

Yuval Adam
  • 161,610
  • 92
  • 305
  • 395
  • 3
    The factors you "put aside" *are* critical. If you start learning Clojure just for this project, you'll either fail to apply its strength - in which case you could just stick with e.g. Java - or lose so much time that it would have to be really awesome for this task to compensate for the lost time. –  Nov 08 '10 at 21:50

4 Answers4

16

There are a few good reasons for choosing functional programming for data mining projects.

  1. Usually data mining projects involve algorithmics and mathematics (than other types of systems) which can be more easily expressed in functional programming
  2. Data mining projects would involve aggregate functions - which are better in functional programming, say Clojure
  3. Data mining programs also would be more suitable to parallelism - definitely data parallelism and could even be task parallelism, again a forte of functional programming
  4. And functional languages like Clojure can interface with java anyway for I/O, file read and write
  5. I think one can learn the tool chain easily; it is not that different and so that shouldn't be a factor.

I was asking the same question myself and came with a big Yes for Clojure - am still thinking through how to include R in the mix.

halfer
  • 19,824
  • 17
  • 99
  • 186
Krishna Sankar
  • 3,717
  • 2
  • 17
  • 13
3

Use the most powerful language you are comfortable with.

In any case, if you want to get scalability you need to have a map-reduce implementation which allow you to parallellize and collect the results.

Thorbjørn Ravn Andersen
  • 73,784
  • 33
  • 194
  • 347
2

No particular reason. Pick whatever language you feel most comfortable with.

See my answer to a similar question about natural language processing. I think that some of the features people think obscure languages are suited to AI are really counterproductive.

Community
  • 1
  • 1
Ken Bloom
  • 57,498
  • 14
  • 111
  • 168
0

Often, functional programming solutions are more scalable.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
keuleJ
  • 3,418
  • 4
  • 30
  • 51