-1

I use faker in a multiprocessing env. I insert test data to mongo db using the pymongo's bulk method. For each bulk I create fake data. That does work in a single process env but not for multiprocessing.

Code logic looks like the following.

from faker import Faker
import multiprocessing
from bson import Decimal128

def insert_bulk(bulk_size):
    fake = Faker()

    fake_data = [get_fake_dataset(fake) for i in range(bulk_size)]

    #write to db

def get_fake_dataset(fake):
    Faker.seed(0)
    return {
        "lat": Decimal128(fake.latitude()),
        "lon": Decimal128(fake.longitude()),
    }


pool = multiprocessing.Pool(processes=multiprocessing.cpu_count() - 1)

pool.map(insert_bulk, [1000])

    

I played around with it, but didn't end up with a result. All elements in the list element are the same.

I know that processes don't share the memory and I'd guess as I create an instance in a different memory, the Faker class in which the seed() method is used cannot reference anymore the new instance just created in the process. So far my maybe correct guessing.

Can someone tell what I do wrong? :) And what a correct solution would look like.

I also followed this approach:

Faker = Factory.create
fake = Faker()
fake.seed(0)

Creating the factory in the insert_bulk method, but that also didn't help.

Thanks.

Progman
  • 16,827
  • 6
  • 33
  • 48
mandano
  • 31
  • 4

1 Answers1

0

ok...the question is in fact not very smart; as i always seed with 0, the result is always the same. I didn't understand the documentation there.

Faker.seed() solves the issue

mandano
  • 31
  • 4