3

My Mongo collection : Impressions has docs in the following format:-

   {
        _uid: 10,
        "impressions": [
            {
                "pos": 6,
                "id": 123,
                "service": "furniture"
            },
            {
                "pos": 0,
                "id": 128,
                "service": "electronics"
            },
            {
                "pos": 2,
                "id": 127,
                "service": "furniture"
            },
            {
                "pos": 2,
                "id": 125,
                "service": "electronics"
            },
            {
                "pos": 10,
                "id": 124,
                "service": "electronics"
            }
        ]
      },
     {
        _uid: 11,
        "impressions": [
            {
                "pos": 1,
                "id": 124,
                "service": "furniture"
            },
            {
                "pos": 10,
                "id": 124,
                "service": "electronics"
            },
            {
                "pos": 1,
                "id": 123,
                "service": "furniture"
            },
            {
                "pos": 21,
                "id": 122,
                "service": "furniture"
            },
            {
                "pos": 3,
                "id": 125,
                "service": "electronics"
            },
            {
                "pos": 10,
                "id": 121,
                "service": "electronics"
            }
            ]
         },
            .
            .
            .
            .
            .

Each of the doc in the collection has "impressions" key which is an array of dictionaries. In each dictionary "id" is the id of the entity, "service" is the service type and "pos"is the position of the item in the search page results. My aim is to find out the count of number of impressions for every "id" in each category. So for the above data for "service" == "furniture", I want to have this as my aggregation results:-

[
{"id": 123,"impressions_count":2},
{"id": 127,"impressions_count":1},
{"id": 124,"impressions_count":1},
{"id": 122,"impressions_count":1}
]

I tried to aggregate on the "id" using MAPREDUCE via following function in a python script

def fetch_impressions():
    try:
        imp_collection = get_mongo_connection('Impressions')
        map = Code("""
                function(){
                    for( x in this.impressions){
                        var flat_id = x['id'];
                        var service_type = x['service']
                        emit(parseInt(flat_id),1);
                        }
                    };
                """)

                        """)
        reduce = Code("""
                        function(a,b){
                            return Array.sum(b);
                            };
                        """)

        results = imp_collection.map_reduce(map, reduce, 'aggregation_result')
        return results
    except Exception as e:
        raise Exception(e)

But I'm getting the results as None, probably because of the faulty map function.I'm new to Javascript and Mongo kindly help!

styvane
  • 59,869
  • 19
  • 150
  • 156
nimeshkiranverma
  • 1,408
  • 6
  • 25
  • 48

2 Answers2

3

You can use the aggregation framework

import pymongo
conn = pymongo.MongoClient()
db = conn.test
col =  db.collection

for doc in col.aggregate([{'$unwind': '$impressions'}, 
    {'$match': {'impressions.service': 'furniture'}}, 
    {'$group': {'_id': '$impressions.id', 'impressions_count': {'$sum': 1}}}, 
    ]):
    print(doc)

Or more efficiently using the $map and the $setDifference operators.

col.aggregate([
    { "$project": { "impressions": {"$setDifference": [{ "$map": { "input": "$impressions", "as": "imp", "in": { "$cond": { "if": { "$eq": [ "$$imp.service", "furniture" ] }, "then": "$$imp.id", "else": 0 }}}}, [0]]}}}, 
    { "$unwind": "$impressions" }, 
    { "$group": { "_id": "$impressions", "impressions_count": { "$sum": 1 }}}
])

Which yields:

{'_id': 122.0, 'impressions_count': 1}
{'_id': 124.0, 'impressions_count': 1}
{'_id': 127.0, 'impressions_count': 1}
{'_id': 123.0, 'impressions_count': 2}
styvane
  • 59,869
  • 19
  • 150
  • 156
  • 1
    This still does not answer the question of how would we approach this using `map_reduce()` API – Ayush Nov 04 '17 at 05:53
  • Why would you want to use `map_reduce` when there is a better way to do this? @AyushKSingh – styvane Nov 04 '17 at 08:27
  • 2
    completely agree on that, but we should also answer the question using map-reduce before offering a better solution so that the user who asks question leaves the site learning not only a new and better way but also realizes the problem in their solution and it's corresponding fix. – Ayush Nov 04 '17 at 16:47
1

I made a tool that lets you run MongoDB Map/Reduce in Python

https://mreduce.com

import random
import threading

import bson
import pymongo

import mreduce


mongo_client = pymongo.MongoClient("mongodb://your_mongodb_server")

def map_func(document):
    for impression in document["impressions"]:
        yield document["id"], 1

def reduce_func(id, prices):
    return sum(prices)

worker_functions = {
    "exampleMap": map_func,
    "exampleReduce": reduce_func
}

api = mreduce.API(
    api_key = "...",
    mongo_client = mongo_client
)

project_id = "..."

thread = threading.Thread(
    target=api.run,
    args=[project_id, worker_functions]
)
thread.start()

job = api.submit_job(
    projectId=project["_id"],
    mapFunctionName="exampleMap",
    reduceFunctionName="exampleReduce",
    inputDatabase="db",
    inputCollection="impressions",
    outputDatabase="db",
    outputCollection="impressions_results"
)
result = job.wait_for_result()
for key, value in result:
    print("Key: " + key, ", Value: " + str(value))
  • 1
    Welcome to Stack Overflow! Just linking to your own library or tutorial is not a good answer. Linking to it, explaining why it solves the problem, providing code on how to do so and disclaiming that you wrote it makes for a better answer. See: [**What signifies “Good” self promotion?**](//meta.stackexchange.com/q/182212) – NathanOliver Jan 14 '20 at 20:34
  • 1
    I did explain why it solves the problem, it allows you to run map/reduce in Python like OP is asking. Also I did say that I wrote the tool – Colton Leekley-Winslow Jan 14 '20 at 20:39
  • 1
    @Dharman ErlVolton is my other account I was accidentally signed into. Also I updated the code so it solves the specific problem OP had, totalling impressions per ID – Colton Leekley-Winslow Jan 14 '20 at 20:58