There is a bloom filter object that created by pybloom
, a Python module. Assume that I have over 10 million strings that waiting for add into this object and the general way to do so is:
from pybloom import BloomFilter
# initialize a bloomfilter object
bf = BloomFilter(int(2e7))
for i in string_list:
bf.add(i)
But this costs too much time specially when the string_list
is really long. Since my computer(windows7) is 4-core CPU and I want to know if there is a multi-process way to make fully use of CPU and fast the add
method.
I know a little about multiprocessing
, but I cannot solve the problem that exchanging customized objects, such as bf
above, between processes.
Forgive my poor English and show me the code if you can. Thanks.