0

I have a large list of elements ~ 100000 and need to map it as follows:

def mark_diff(args):
    item = args[0]
    pi = args[1]
    item.marked_diff = (item.p/pi[0]+item.c/pi[1]+item.f/pi[2] - 3)**2
    return item

def mark(f_set , goal):
    with Pool(3) as p:
        data = p.map(mark_diff , zip(f_set , itertools.repeat(goal)))
    return data

The default value of item.markded_diff is 0, and item is a mongoengine document.

I am resorting to multiprocessing because the mark_diff is substantially more complicated than shown here and involves a lot of exponents, logarithms for which i am using numpy.

Now for the problem,

The returned data still has item.marked_diff as 0. While if I add a print statement at the bottom of mark_diff correct values are being assigned and are non-zero.

Definition of item.

    import random,mongoengine
    class F(mongoengine.Document):
        p = mongoengine.FloatField()
        c = mongoengine.FloatField()
        f = mongoengine.FloatField()
        marked_diff = 0
    f_sets = F.objects.all()    
    goal = [0.2,0.35,0.45]
xssChauhan
  • 2,728
  • 2
  • 25
  • 36

1 Answers1

1

So something is going on in what you didn't show. When I flesh this out into a complete, executable program, it appears to work fine. Here's the output from one run under Python 3.6.1:

0.7024116548559156
13.468354599594324
6.036133666404753
0.16520292241977205
0.17073749475275496
1.903674418518389
0.2432159511273063
7.743326563037492
4.1990243814914425
19.36243187965931

And here's the full program:

from multiprocessing import Pool
import random
import itertools

class F:
    def __init__(self):
        self.p = random.random()
        self.c = random.random()
        self.f = random.random()

def mark_diff(args):
    item = args[0]
    pi = args[1]
    item.marked_diff = (item.p/pi[0]+item.c/pi[1]+item.f/pi[2] - 3)**2
    return item

def mark(f_set , goal):
    with Pool(3) as p:
        data = p.map(mark_diff , zip(f_set , itertools.repeat(goal)))
    return data

if __name__ == "__main__":
    f_set = [F() for _ in range(10)]
    goal = [0.2,0.35,0.45]
    xs = mark(f_set, goal)
    for x in xs:
        print(x.marked_diff)

Is it possible that you're looking at marked_diff in the original f_set instead of in the items returned by mark()?

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • In the real application object `F` is an instance of mongoengine document. The calculations in `mark_diff` are alright because when I print `item.marked_diff` at the bottom of `mark_diff` , it shows the expected values. I checked, and yes I am looking at the values returned from `mark_diff` – xssChauhan Apr 11 '17 at 17:32
  • 1
    And I just demonstrated that it works fine for the code you posted. Does the code I posted also work for you? If so, then there's no _general_ problem with the code - in which case it's probably an implementation problem very specific to the way "mongoengine documents" behave under Python's pickle protocol. In which case, it's about pickle, not really about multiprocessing. But in the absence of you posting code that actually fails for other people, all anyone else can do is guess blindly. – Tim Peters Apr 11 '17 at 17:42
  • I just added `marked_diff` as `FloatField` in mongoengine model definition. It works now. I'll update the question with context for mongoengine model so that it can be understood that the problem was with mongoengine document. Why does it behave so, I would still like to know. – xssChauhan Apr 11 '17 at 17:45
  • 1
    Yes, your code works. That's what triggered me to test with a `FloatField`. Thanks for your time. – xssChauhan Apr 11 '17 at 17:52