I am running a CPU intensive job in GCP with VM - c2-standard-16
The batch job runs daily with cron schedule. It runs a facematch algorithm using tensorflow for many students folder in parallel. I am making use of multiprocessing to parallel process the request. The output of which is written to a CSV file, and put into BigQuery.
loop each topic :
multiprocessing.set_start_method('spawn')
pool = Pool(multiprocessing.cpu_count())
Also, it runs for n number of topics, so n-topics(say 10) x n-student (say 200-3000+) folders needs to be processed.
result = pool.map(self.process_student, folders, chunksize=1)
df = pd.DataFrame(result)
df.to_csv(csv_report_name, index=False)
The script works fine for 200 student folder, When it comes to 2000 and above student folders, it stop after processing about 400 students.(as i see in the log) the script shuts down abruptly the process and the VM is non-responsive. SSH connections are broken.
Doing a top
show while in run CPU Usage is show as 100% or above.
All cores are utilised >100%
Tried so far
- Divide the folders into chunk of 200 and introduced a sleep time of 10 mins, to throttle down the CPU usage. (works for 2-3 chunks) and stops again.
- added a delay of mill-seconds, in the parallel process method -> self.process_student
- Choose half of the cores from available cores to multi-process. (Half of the Cores will still throttle to 100% and above.) The CPU usage quota is not editable, CPU usage quota is full.
Tried all the guide Limit total CPU usage in python multiprocessing None worked. Please help.