I have written a program that I am using to benchmark a mongodb database performing under multithreaded bulk write conditions.
The problem is that the program hangs and does not finish executing.
I am quite sure that the problem is due to writing 530838 records to the database and using 10 threads to bulk write 50 records at a time. This leaves a modulo value of 38 records, however the run method fetches 50 records from the queue so the process hangs when 530800 records have been written and never writes the final 38 records as the following code never finishes executing
for object in range(50):
objects.append(self.queue.get())
I would like the program to write 50 records at a time until fewer than 50 remain at which point it should write the remaining records in the queue and then exit the thread when no records remain in the queue.
Thanks in advance :)
import threading
import Queue
import json
from pymongo import MongoClient, InsertOne
import datetime
#Set the number of threads
n_thread = 10
#Create the queue
queue = Queue.Queue()
#Connect to the database
client = MongoClient("mongodb://mydatabase.com")
db = client.threads
class ThreadClass(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
#Assign thread working with queue
self.queue = queue
def run(self):
while True:
objects = []
#Get next 50 objects from queue
for object in range(50):
objects.append(self.queue.get())
#Insert the queued objects into the database
db.threads.insert_many(objects)
#signals to queue job is done
self.queue.task_done()
#Create number of processes
threads = []
for i in range(n_thread):
t = ThreadClass(queue)
t.setDaemon(True)
#Start thread
t.start()
#Start timer
starttime = datetime.datetime.now()
#Read json object by object
content = json.load(open("data.txt","r"))
for jsonobj in content:
#Put object into queue
queue.put(jsonobj)
#wait on the queue until everything has been processed
queue.join()
for t in threads:
t.join()
#Print the total execution time
endtime = datetime.datetime.now()
duration = endtime-starttime
print(divmod(duration.days * 86400 + duration.seconds, 60))