23

Are there any alternative paradigms to MapReduce (Google, Hadoop)? Is there any other reasonable way how to split & merge big problems?

Cartesius00
  • 23,584
  • 43
  • 124
  • 195
  • MapReduce is not algorithm or paradigm, it is technology. – Luka Rahne Jan 01 '12 at 11:15
  • 4
    @ralu: There are many ways how to deal with big problems. MapReduce DEFINITELY is only one of them and it DEFINITELY is both paradigm and algorithm. Also its implementation becomes technology, but I am not interested in implementations rather ideas. Thank you. – Cartesius00 Jan 01 '12 at 11:17
  • Why do you think about your problem as split and merge. You just need to solve problem. For instance Apache Pig deals whit data using SQL like language. And there is no split and merge way of thinking although it can run on cluster of hundreds machines and uses Hadoop as platform. – Luka Rahne Jan 01 '12 at 12:30
  • 1
    @ralu: Hive has the SQL like syntax. The Pig syntax is completely different. – Niels Basjes Jan 01 '12 at 12:46
  • 2
    @ralu: I am looking for ideas, you're completely on another level of implementation. – Cartesius00 Jan 01 '12 at 13:57
  • @Niels Basjes you are right, but my point is in view of problem. If problem can be expressed in split/merge map-reduce is way to go, because it was made for this kind of things. Point is that you need something that is easy to express problem whit which can be later run on computational device. Cluster is just computational device, and optimization is compiler/framework problem. Unfortunately, most of them are still pretty dumb. – Luka Rahne Jan 01 '12 at 17:56

5 Answers5

13

Definitively. Check out, for example, Bulk Synchronous Parallel. Map/Reduce is in fact a very restricted way of reducing problems, however that restriction makes it manageable in a framework like Hadoop. The question is if it is less trouble to press your problem into a Map/Reduce setting, or if its easier to create a domain-specific parallelization scheme and having to take care of all the implementation details yourself. Pig, in fact, is only an abstraction layer on top of Hadoop which automates many standard problem transformations from not-Map-Reduce-y to Map-Reduce-compatible.

Edit 26.1.13: Found a nice up-to-date overview here

Nicolas78
  • 5,124
  • 1
  • 23
  • 41
  • 3
    [Apache Hama](http://incubator.apache.org/hama/) implements BSP. Hama has been ported to [YARN (Yet Another Resource Manager)](http://wiki.apache.org/hama/GettingStartedYARN) which is part of Hadoop 0.23. Check this [blog](http://codingwiththomas.blogspot.com/) on Apache Hama. – Praveen Sripati Jan 01 '12 at 17:10
  • Thanks Praveen ;) Please visit our website and wiki for more information about hama http://incubator.apache.org/hama/ – Thomas Jungblut Jan 02 '12 at 18:05
10

Phil Colella identified seven numerical methods for scientific computation based on the patterns of scattering and gathering of data between processing nodes, and called them 'dwarfs'. These have been added to by others, a list is available at the Dwarf Mine:

  1. Dense Linear Algebra
  2. Sparse Linear Algebra
  3. Spectral Methods
  4. N-Body Methods
  5. Structured Grids
  6. Unstructured Grids
  7. MapReduce
  8. Combinational Logic
  9. Graph Traversal
  10. Dynamic Programming
  11. Backtrack and Branch-and-Bound
  12. Graphical Models
  13. Finite State Machines
Pete Kirkham
  • 48,893
  • 5
  • 92
  • 171
2

Update (August 2014): Stratosphere is now called Apache Flink (incubating).

Have a look at Stratosphere. It is another Big Data runtime that offers more operators (map, reduce, join, union, cross, iterate, ...). It also allows to define advanced data flow graphs (with Hadoop MR, you would have to chain jobs).

Stratosphere also supports BSP with its graph processing abstraction (called Spargel).

If you like to read scientific papers, have a look at Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing, it explains the theoretical backgrounds of the system.

Another system in the field is Spark which has its own model (RDDs). Since BSP has been mentioned here, also have a look at GraphLab, the offer an alternative to BSP.

Robert Metzger
  • 4,452
  • 23
  • 50
0

Microsoft's Dryad is claimed to be more general than MapReduce.

DarenW
  • 16,549
  • 7
  • 63
  • 102
0

Best alternate for MapReduce is Spark, because its 10 to 100 times faster than the MapReduce. And also very easy to maintain, less coding high performance.