0

I'm trying to run pagerank using mapreduce in mongodb.

My documents are in this format:

{
        "_id" : "u: 10000",
        "value" : [
                [
                        "u: 10000",
                        "s: 985272",
                        1
                ],
                [
                        "s: 985272",
                        "u: 10000",
                        1
                ],
                [
                        "u: 10000",
                        "s: 303770",
                        1
                ],
                [
                        "s: 303770",
                        "u: 10000",
                        1
                ]
        ]
}

Now I think the first step is to collect the links by key. However I have several outbound links per document. (These all happen to be bidirectional).

Here are my map and reduce functions:

m = function () {
    for (var i = 0; i < this.value.length; i++){
        var out = {};
        out.out = this.value[i][1];
        out.weight = this.value[i][2];
        emit(this.value[i][0], [out]);
    }
}

r = function(key, values){
    var result = {
      value: [] 
    };
    values.forEach(function(val) {
    result.value.push({out: val.out, weight: val.weight});
    });
    return result;
}

The problem is I'm not sure that emit is producing multiple emissions per document. As I get results like:

{
        "_id" : "s: 1000082",
        "value" : [
                {
                        "out" : "u: 37317",
                        "weight" : 1
                }
        ]
}

When I would expect multiple items per document.

Anyone have any ideas? Help would be appreciated!

EDIT:

I'm not completely satisfied, for example how do things like this work?. The reduce result doesn't at all look like the emit output.

toofarsideways
  • 3,956
  • 2
  • 31
  • 51
  • could you clarify what the field values are? what is "s" and "u", etc? – Asya Kamsky Jul 01 '12 at 22:10
  • They are just different types of documents (webpages) with accompanying id's... – toofarsideways Jul 01 '12 at 22:30
  • the answer below is correct - if you are having trouble adding more fields to the emitted value, I would suggest starting a new question. – Asya Kamsky Jul 03 '12 at 15:17
  • I added some clarification in the answer, but what I describe is exactly how the example you linked to works, so I'm not sure what your dissatisfaction is related to. In that example map emits author as key and "{votes: this.votes}" as value. In reduce it returns "{votes: sum}" which is exactly the structure of the value. – Asya Kamsky Jul 04 '12 at 07:10
  • Sorry, a day away from it makes it much clearer. Thank you for your help. – toofarsideways Jul 05 '12 at 01:18

1 Answers1

3

The issue is that you are not mapping an array but your reduce is trying to push to an array.

If you want to have each key map to an array of "out" and "weight" pairs, then you need to emit an array containing that, and in your reduce you need to concat the arrays together.

Remember, the structure of the object returned by the reduce function must be identical to the structure of the map function's emitted value.

That means that when your map emits (key, value) the structure of "value" must be identical to the structure of what your reduce function returns as a result.

If you change your map function to this, so that value is a document with field "value" which is an array of documents each having field "out" and field "weight":

function () {
    for (var i = 0; i < this.value.length; i++) {
        key = this.value[i][0];
        value = {value:[{out:this.value[i][1], weight:this.value[i][2]}]};
        emit(key, value);
    }
}

and your reduce function to this, which constructs result to have identical structure to the value you emit above (since it just concatenates what it gets passed in for each key):

function (key, values) {
    result = {value:[]};
    for (var i in values) {
        result.value = values[i].value.concat(result.value);
    }
    return result;
}

you will then get what you are expecting back.

{
    "_id" : "s: 303770",
    "value" : {
        "value" : [
            {
                "out" : "u: 10000",
                "weight" : 1
            }
        ]
    }
}
{
    "_id" : "s: 985272",
    "value" : {
        "value" : [
            {
                "out" : "u: 10000",
                "weight" : 1
            }
        ]
    }
}
{
    "_id" : "u: 10000",
    "value" : {
        "value" : [
            {
                "out" : "s: 303770",
                "weight" : 1
            },
            {
                "out" : "s: 985272",
                "weight" : 1
            }
        ]
    }
}
Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133
  • Does every document have to go through the reduction step? I only ask because I added a rank value into the first "value" map, but it only appears on some of the documents. `r = function (key, values) { result = {rank:1.0, value:[]}; for (var i in values) { result.value = values[i].value.concat(result.value); } return result; }` – toofarsideways Jul 02 '12 at 02:42
  • every document gets mapped - you must emit from map the same format your reduce function returns. – Asya Kamsky Jul 02 '12 at 03:44
  • Wait, then how do things like this work? -> http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/. The reduce result doesn't at all look like the emit output. – toofarsideways Jul 02 '12 at 12:35
  • map puts out (key, value) pairs - reduce has to return same format as "value" format - the key part is implicit. – Asya Kamsky Jul 03 '12 at 15:14
  • I'm guessing rank is something you need to compute in the finalize stage. http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-FinalizeFunction but it would be easier to sort out as a separate question. If you didn't add "rank" to what map is emitting with each key, it's unlikely that the reduce function you posted is correct. – Asya Kamsky Jul 03 '12 at 16:17