0

I am running a CPU intensive job in GCP with VM - c2-standard-16

The batch job runs daily with cron schedule. It runs a facematch algorithm using tensorflow for many students folder in parallel. I am making use of multiprocessing to parallel process the request. The output of which is written to a CSV file, and put into BigQuery.

  loop each topic :
  multiprocessing.set_start_method('spawn')
  pool = Pool(multiprocessing.cpu_count())

Also, it runs for n number of topics, so n-topics(say 10) x n-student (say 200-3000+) folders needs to be processed.

result = pool.map(self.process_student, folders, chunksize=1)
df = pd.DataFrame(result)
df.to_csv(csv_report_name, index=False)

The script works fine for 200 student folder, When it comes to 2000 and above student folders, it stop after processing about 400 students.(as i see in the log) the script shuts down abruptly the process and the VM is non-responsive. SSH connections are broken. Doing a top show while in run CPU Usage is show as 100% or above.

All cores are utilised >100%

Tried so far

  • Divide the folders into chunk of 200 and introduced a sleep time of 10 mins, to throttle down the CPU usage. (works for 2-3 chunks) and stops again.
  • added a delay of mill-seconds, in the parallel process method -> self.process_student
  • Choose half of the cores from available cores to multi-process. (Half of the Cores will still throttle to 100% and above.) The CPU usage quota is not editable, CPU usage quota is full.

Tried all the guide Limit total CPU usage in python multiprocessing None worked. Please help.

Amar Kumar
  • 31
  • 8
  • Review the stats for memory usage and disk swapping. If memory usage is too high, your option is to select a larger VM. If the system is swapping memory to disk, then you must select a larger instance size. A quick test is to resize larger and test again. You can resize smaller after the tests. – John Hanley Feb 27 '22 at 10:53
  • Any luck with resizing VM ? – Wojtek_B Feb 28 '22 at 08:14
  • I tried resizing the VM, to 32 gigs of RAM, the same issue. Cant scale higher than it. – Amar Kumar Feb 28 '22 at 08:17
  • What's the [CPU utilisation](https://cloud.google.com/spanner/docs/cpu-utilization) in Cloud Monitoring ? Can you provide some more logs ? – Wojtek_B Feb 28 '22 at 11:23

0 Answers0