Running threads in parallel takes more time then sequential execution in python

Question

I have two ONNX deep learned models. I want to run both the models parallelly. I am using threads from python. But surprisingly it is taking more time then running both the models sequentially.

Task to be done.

make a class of model
load both the models in the init of that class.
run both the models parallelly for inferencing on the given input.

Is this normal behavior. please suggest the workaround to this?

class ModelImp:

def __init__(self):
    print('loading model...')
    # Load your model here
    curr_dir = os.getcwd()
    model_path = os.path.join(curr_dir, "model", "hatev5.onnx")
    self.hate_sess = onnxruntime.InferenceSession(model_path)
    self.hate_input_name = self.hate_sess.get_inputs()[0].name
    self.hate_seq_len=15
    self.corona_seq_len=16
    print('********************************Hate model loaded.**********************************************************')
    model_path = os.path.join(curr_dir, "model", "corona.onnx")
    self.corona_sess = onnxruntime.InferenceSession(model_path)
    self.corona_input_name = self.corona_sess.get_inputs()[0].name
    # self.model = keras.models.load_model(model_path, custom_objects={"gelu": gelu})
    # print(self.model.summary())
    print('********************************Corona model loaded.**********************************************************')
    print("_________________________************MODEL.py : loading tokenizer ************___________________________")
    curr_dir = os.getcwd()
    vocab_path = os.path.join(curr_dir, "model", "vocab.txt")
    self.wordpiece_tokenizer = tokenization.FullTokenizer(vocab_path, do_lower_case=True)
    tokenizer_path = os.path.join(curr_dir, "model", "hate_tokenizer.json")
    with open(tokenizer_path) as f:
        data = json.load(f)
        self.hate_tokenizer = tokenizer_from_json(data)
    print("_________________________************ HATE MODEL.py : tokenizer loaded************___________________________")
    tokenizer_path = os.path.join(curr_dir, "model", "corona_tokenizer.json")
    with open(tokenizer_path) as f:
        data = json.load(f)
        self.corona_tokenizer = tokenizer_from_json(data)
    print("_________________________************ CORONA MODEL.py : tokenizer loaded************___________________________")
    curr_dir = os.getcwd()
# string version of Eval
# data is a string
def thread_eval(self,data,q):
    # print("--------------------------------------corona started----------------------------------------------------------")
    corona_lines = []
    corona_line = ' '.join(trim(self.wordpiece_tokenizer.tokenize(data.strip()), self.corona_seq_len))
    corona_lines.append(corona_line)
    # print(texts)
    corona_line_1 = self.corona_tokenizer.texts_to_sequences(corona_lines)
    corona_line_2 = sequence.pad_sequences(corona_line_1, padding='post', maxlen=self.corona_seq_len)
    corona_pred = self.corona_sess.run(None, {self.corona_input_name: corona_line_2})
    corona_prob = corona_pred[0][0][1]
    q.put(corona_prob)
    # print("---------------------------------------corona ended------------------------------------------------------------")
def Eval(self, data):

    try:
        
        
        # pre_start = time.time()
        # mp = ModelImp()
        # with tf.Graph().as_default() as graph: #tf.device(config['gpu_device'] )

        # print(data)
        d = json.loads(data)
        out_json = {}

        if (not (("query" in d) or ("Query" in d))):
            # print("Query: ",data)
            score = -2 * 10000  # new_change
            output = {"Output": [[score]]}  # {"score" :score,"Succ" : False }
            output_str = json.dumps(output)
            return output_str
        if ("query" in d):
            query = d["query"][0]  # new_change
            # print("Query 1: ",query)
        elif ("Query" in d):
            query = d["Query"][0]  # new_change
            # print("Query 2: ",query)
        if (len(query.strip()) == 0):
            query = "good"
            # print("Query 3: ",query)
        ## HATE MODEL input preprocess
        que = queue.Queue()
        x = threading.Thread(target=self.thread_eval, args=(query,que),daemon=True)
        x.start()
        hate_lines = []
        hate_line = ' '.join(trim(self.wordpiece_tokenizer.tokenize(query.strip()), self.hate_seq_len))
        hate_lines.append(hate_line)
        # print(texts)
        hate_line_1 = self.hate_tokenizer.texts_to_sequences(hate_lines)
        hate_line_2 = sequence.pad_sequences(hate_line_1, padding='post', maxlen=self.hate_seq_len)
        ## CORONA MODEL input preprocess
        
        # print(line_2)
        
        # print("----------------------------------------hate started----------------------------------------")
        hate_pred = self.hate_sess.run(None, {self.hate_input_name: hate_line_2})
        # print("----------------------------------------hate ended----------------------------------------")
        # print("pred: ",pred[0])
        # prob = math.exp(pred[0][0][1])/(math.exp(pred[0][0][0]) + math.exp(pred[0][0][1]))
        hate_prob = hate_pred[0][0][1]
        # print("hate_prob: ",hate_prob)
        # hate_score = int(hate_prob * 10000)  # new_change  
        x.join()          
        corona_prob=que.get()
        # print("pred: ",pred[0])
        # prob = math.exp(pred[0][0][1])/(math.exp(pred[0][0][0]) + math.exp(pred[0][0][1]))
        
        # print("corona_prob: ",corona_prob)
        output_prob = max(corona_prob,hate_prob)
        # corona_score = int(corona_prob * 10000)  # new_change
        output_score = int(output_prob * 10000)

        output = {"Output": [[output_score]]}  # {"score" :score,"Succ" : True }
        output_str = json.dumps(output)

        return output_str

    except Exception as e:
        print("Exception: ",data)
        score = -3 * 10000  # new_change
        output = {"Output": [[score]]}  # {"score" :score,"Succ" : False }
        output_str = json.dumps(output)
        print(e)
        return output_str

Can't see any code. Despite that, Python multithreading is generally not useful for speeding up CPU-bound workloads. — couka, Feb 19 '21 at 18:22
@couka Is there any way to use multiple cores of the CPU in python? — Abhishek Gangwar, Feb 19 '21 at 21:01
@couka Yes, there are multiple options for using multiple cores with python. Since your code doesn't look amenable to NumPy, you might wish to start with the [multiprocessing module](https://docs.python.org/3/library/multiprocessing.html) in the standard library. — Galen, Feb 19 '21 at 21:07
A more difficult option would be to write a C extension that uses OpenMP, and link that to your Python code. [Cython](https://cython.org/) supports [OpenMP](https://www.openmp.org/), and is one of the relatively easy ways to write C extensions. — Galen, Feb 19 '21 at 21:08
If I may expand on the other comments, Python has a "global interpreter lock" that ensures that only one thread at a time is executing Python code. It works great if most of your threads are waiting on I/O or a socket, but the only way to get true simultaneous multithreading is to use a C module like numpy. — Tim Roberts, Feb 19 '21 at 23:35

Running threads in parallel takes more time then sequential execution in python

0 Answers0