MapReduce implementation in Scala

Question

I'd like to find out good and robust MapReduce framework, to be utilized from Scala.

score 31 · Accepted Answer · edited Nov 04 '11 at 21:13

31

To add to the answer on Hadoop: there are at least two Scala wrappers that make working with Hadoop more palatable.

Scala Map Reduce (SMR): http://scala-blogs.org/2008/09/scalable-language-and-scalable.html

SHadoop: http://jonhnny-weslley.blogspot.com/2008/05/shadoop.html

UPD 5 oct. 11

There is also Scoobi framework, that has awesome expressiveness.

edited Nov 04 '11 at 21:13

om-nom-nom

62,329
13
183
228

answered Jun 08 '09 at 20:00

Jorge Ortiz

4,722
1
21
22

SHadoop is quite old--it uses the old MR framework. I updated the implicits at some point: https://github.com/schmmd/Hadoop-Scala-Commons – schmmd Dec 08 '11 at 22:57

score 3 · Answer 2 · answered Feb 01 '13 at 12:49

3

Personally, I've become a big fan of Spark

http://spark-project.org/

You have the ability to do in-memory cluster computing, significantly reducing the overhead you would experience from disk-intensive mapreduce operations.

answered Feb 01 '13 at 12:49

MattM

807
3
11
20

score 3 · Answer 3 · answered Jun 07 '09 at 15:20

3

http://hadoop.apache.org/ is language agnostic.

answered Jun 07 '09 at 15:20

bayer

6,854
24
35

I'm sorry but I didn't ask for Java implementation. Indeed, Hadoop can be plugged into Scala but the boilerplate code have to be written in Java. – Roman Kagan Jun 08 '09 at 03:26
1

Write a ScalaHadoopAdapter which takes care of all the boilerplate and publish it as free/open-source? – yfeldblum Jun 12 '09 at 04:39
7

the boilerplate does not need to be written in java. – jshen Nov 24 '09 at 04:17

AWhitford · Answer 4 · 2009-10-30T07:09:50.183

2

You may be interested in scouchdb, a Scala interface to using CouchDB.

Another idea is to use GridGain. ScalaDudes have an example of using GridGain with Scala. And here is another example.

edited Oct 30 '09 at 07:09

answered Oct 30 '09 at 06:50

AWhitford

3,708
3
28
38

score 2 · Answer 5 · answered Jul 18 '10 at 22:40

A while back, I ran into exactly this problem and ended up writing a little infrastructure to make it easy to use Hadoop from Scala. I used it on my own for a while, but I finally got around to putting it on the web. It's named (very originally) ScalaHadoop.

Xela · Answer 6 · 2012-01-12T01:02:04.657

For a scala API on top of hadoop check out Scoobi, it is still in heavy development but shows a lot of promise. There is also some effort to implement distributed collections on top of hadoop in the Scala incubator, but that effort is not usable yet.

There is also a new scala wrapper for cascading from Twitter, called Scalding. After looking very briefly over the documentation for Scalding it seems that while it makes the integration with cascading smoother it still does not solve what I see as the main problem with cascading: type safety. Every operation in cascading operates on cascading's tuples (basically a list of field values with or without a separate schema), which means that type errors, I.e. Joining a key as a String and key as a Long leads to run-time failures.

Scalding does have a type-safe API: https://github.com/twitter/scalding/wiki/Type-safe-api-reference and in the Fields API (which you are mentioning), joining a string to a long doesn't cause run-time exceptions (if they are both numbers). Of course, in the type-safe API such a join is prohibited by the compiler. — Oscar Boykin, Feb 20 '13 at 05:56

score 1 · Answer 7 · answered Dec 15 '10 at 02:14

to further jshen's point:

hadoop streaming simply uses sockets. using unix streams, your code (any language) simply has to be able to read from stdin and output tab delimited streams. implement a mapper and if needed, a reducer (and if relevant, configure that as the combiner).

score 0 · Answer 8 · answered May 24 '21 at 08:42

0

I've added MapReduce implementation using Hadoop on Github with few test cases here: https://github.com/sauravsahu02/MapReduceUsingScala. Hope that helps. Note that the application is already tested.

answered May 24 '21 at 08:42

Saurav Sahu

13,038
6
64
79

MapReduce implementation in Scala

8 Answers8