I use faker in a multiprocessing env. I insert test data to mongo db using the pymongo's bulk method. For each bulk I create fake data. That does work in a single process env but not for multiprocessing.
Code logic looks like the following.
from faker import Faker
import multiprocessing
from bson import Decimal128
def insert_bulk(bulk_size):
fake = Faker()
fake_data = [get_fake_dataset(fake) for i in range(bulk_size)]
#write to db
def get_fake_dataset(fake):
Faker.seed(0)
return {
"lat": Decimal128(fake.latitude()),
"lon": Decimal128(fake.longitude()),
}
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count() - 1)
pool.map(insert_bulk, [1000])
I played around with it, but didn't end up with a result. All elements in the list element are the same.
I know that processes don't share the memory and I'd guess as I create an instance in a different memory, the Faker class in which the seed() method is used cannot reference anymore the new instance just created in the process. So far my maybe correct guessing.
Can someone tell what I do wrong? :) And what a correct solution would look like.
I also followed this approach:
Faker = Factory.create
fake = Faker()
fake.seed(0)
Creating the factory in the insert_bulk method, but that also didn't help.
Thanks.