Memory Leak in Cloud Run Job

Question

I have a Cloud Run job already deployed on GCP. It's basically a bounch of python scripts that downloads ~140+ *.zip or *.rar files from the web into a ./tmp directory within the archive filesystem, extracts them into the same directory using the Python zipfile mor rarfile modules, removes the compressed files from the ./tmp directory, uploads the extracted files to a GCS bucket, and finally removes the extracted files from the ./tmp directory. If the extracted files are *.dbf files, using the Python dbfread module, I read the files and write them to the container's file system in the ./tmp directory, before uploading them to Cloud Storage. All files uploaded to the GCS repository are uploaded as *.txt files. Every time a file is uploaded to the GCS repository, it is removed from the ./tmp directory.

Still, I have problems with memory usage. My program starts running and after a while the memory consumption starts to grow. The Cloud Run job ends up returning an error because the container reaches its maximum memory limit.

"Memory limit of 512M exceeded with 512M used. Consider increasing the memory limit, see https://cloud.google.com/run/docs/configuring/memory-limits"

I've tried changing the memory limit as the output says, but the Cloud Run Job ends up returning the same error again and again.

How can i debug this problem?

I would run the python scripts outside of a container on a local machine. Add logging so that you can measure activity within the container and memory profiling to see which sections of code consume the most memory. Once you know where memory usage is high, use the `divide by two` debugging method to narrow the problem to a section of code. I would also use products such as PyCharm's environment to speed up debugging. — John Hanley, May 05 '23 at 19:55

Memory Leak in Cloud Run Job

0 Answers0