1

(I only have conceptual knowledge of NoSQL, no working experience)

I am aware of the following types of NoSQL databases:

  • key-value, column family, document databases (Aggregates)
  • graph databases

Is the Map-Reduce paradigm applicable to all? My guess would be no since Map-Reduce is often discussed in terms of keys and values, but since the distinction between different NoSQL stores isn't so clean-cut, I am wondering where Map-Reduce is and isn't applicable. And since I'm in the process of evaluating which DB to use for a few app ideas I have, I should think whether it's possible to achieve large scale processing regardless of which store I use.

Sridhar Sarnobat
  • 25,183
  • 12
  • 93
  • 106

1 Answers1

1

Support for map reduce probably shouldn't be the thing on which to base your choice of a datastore.

Firstly, map reduce isn't the only way to do large-scale data processing. For example, MongoDB implemented map reduce support early (in v1), but later added their Aggregation Framework which was much more general, subsuming many tasks that would make use of map reduce.

Map reduce is just one paradigm for processing large data sets. Use it only if your application needs to process a large number of data records with a mapper and then needs to combine results together with a reducer. That's all it really does. As to when the paradigm is applicable and when it is not, simply look at your use case. Do you need to manipulate all of your records consistently and then combine the results? Or is there another way to phrase your problem?

Take a look at the Mongo aggregation framework for examples of where aggregation is used as a simpler alternative to many problems for which forcing them into a map-reduce problem would be overkill.

It should also help give you insight into your question of whether you can do large-scale data processing without map-reduce, to which the answer is yes. Clearly map-reduce is good for making search indexes, but many problems on large data sets benefit from other paradigms.

A web search on "alternatives to map reduce" will also be helpful.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
  • That's very helpful Ray. I'll leave the question open a little longer before marking yours correct. – Sridhar Sarnobat Jun 06 '13 at 20:51
  • The (naive) reason I want Map-Reduce to be part of my arsenal is that the big boys do it (when in doubt, follow the crowd). I know I'm not trying to be Google or Amazon so I shouldn't blindly follow (it's like saying I should choose a key-value store because they do), but when solutions aren't talked about so frequently (I've never heard of Mongo Aggregation Framework) the tendency is to run away. But you're right. – Sridhar Sarnobat Jun 06 '13 at 20:54
  • Map reduce _is_ awesome and powerful and it does shine for many tasks. There will be cases where you will want it. But sometimes it is overkill and sometimes it is not appropriate because you need communication between elements, and the map phase of map-reduce does not do that. Did you see [this SO question yet](http://stackoverflow.com/questions/8692806/mapreduce-alternatives)? – Ray Toal Jun 06 '13 at 21:46
  • Thanks for the link. I'll need to go through it more carefully. I agree, Map Reduce is overkill especially when you consider relational databases give this for free and can scale to a significant extent. – Sridhar Sarnobat Jun 06 '13 at 21:59