3

I have a CouchDB database with a view whose values are paired numbers of the form [x,y]. For documents with the same key, I need (simultaneously) to compute the minimum of x and the maximum of y. The database I am working with contains about 50000 documents. Building the view takes several hours, which seems somewhat excessive. (The keys are themselves length-three arrays.) I show the map and reduce functions below, but the basic question is: how can I speed up this process?

Note that the builtin functions won't work because the values have to be numbers, not length-two arrays. It is possible that I could make two different views (one for min(x) and one for max(y)), but it is unclear to me how to combine them to get both results simultaneously.

My current map function looks basically like

function(doc) {
  emit ([doc.a, doc.b, doc.c], [doc.x, doc.y])
}

and my reduce function looks like

function(keys, values) {
  var x = null;
  var y = null;
  for (i = 0; i < values.length; i++) {
    if (values[i][0] == null) break;
    if (values[i][1] == null) break;
    if (x == null) x = values[i][0];
    if (y == null) y = values[i][1];
    if (values[i][0] < x) x = values[i][0];
    if (values[i][1] > y) y = values[i][1];
  }
  emit([x, y]);
}

3 Answers3

2

Just two more notes. Using Math.max() and Math.min() should be a little faster.

function(keys, values) {
  var x = -Infinity,
      y = Infinity;
  for (var i = 0, v; v = values[i]; i++) {
    x = Math.max(x, v[0]);
    y = Math.min(y, v[1]);
  }
  return [x, y];
}

And if CouchDB is treating the values as strings, it is because you are storing them as strings in the document.

Hope it helps.

Marcello Nuccio
  • 3,901
  • 2
  • 28
  • 28
  • I hadn't tried Math.max because I wasn't sure how it treated null's. (I work a lot in R, where the nearest equivalent of null is NA, which absorbs all numbers in arithmetic operations.) Running a test, however, confirms that Math.max ignores nulls, and so does work here. – Kevin Coombes Mar 10 '11 at 19:54
  • Not exactly. Math.max does not ignore null, it does treat it as 0. For example: "Math.max(null, 1) == 1", "Math.max(null, -1) == 0". All of the following test are true: "-1 < null", "null > -1" and "null >= 0". This is why I used -Infinity as starting value. – Marcello Nuccio Mar 10 '11 at 22:32
1

This turned out to be a combination of two factors. One is obvious in the code posted above, where uses "emit" when it should use "return".

The other factor is less obvious and was only found by making a smaller version of the database and logging the steps in the reduce function. Although the entries in "values" were meant to be integers, they were being treated by CouchDB as character strings. Using the parseInt function corrected that problem.

After those two fixes, the entire build of the reduced view took about five minutes, so the speed problem evaporated.

0

Please check http://www.geeksforgeeks.org/archives/4583 . This may be extended to your application.

enthusiasticgeek
  • 2,640
  • 46
  • 53