We tried to parallelize our program in Python by using threads. The problem is, we don't get 100% of the CPU used. The CPU uses all 8 cores but only on usage of roundabout 50-60% sometimes lower. Why does the CPU not work with a 100% workload on the calculation?
We are programming in Python on Windows.
Here is our implementation for the multithreading:
from threading import Thread
import hashlib
class CalculationThread(Thread):
def init(self, target: str):
Thread.init(self)
self.target = target
def run(self):
for i in range(1000):
hash_md5 = hashlib.md5()
with open(str(self.target), "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
f = hash_md5.hexdigest()
print(self.getName() + "Finished")
threads = []
for i in range(20):
t = CalculationThread(target="baden-wuerttemberg-latest.osm.pbf")
print("Worker " + str(t.getName()) + " started")
t.start()
threads.append(t)
for t in threads:
t.join()
CPU workload while running the calculation: