0

I have recurring performance issue for my cloud datalab GCE instances, which over time seem to get swamped by root-level node and zip processes that are unrelated to my notebooks. (see image)

I have 4 CPUs and now 15GB of RAM, but making machines bigger does not solve the problem. I do have ~30GB of images stored on the datalab persistent disk, which may somehow (?) be contributing to the problem.

Creating brand new CDL instances does help, but this is an inconvenient approach.

Suggestions for resolutions, or the start of a diagnosis of the problem would be much appreciated.

top and console output

1000
  • 3
  • 3
  • The 'node' process is likely Datalab's node.js server (which serves the browser UX pages for datalab), so that is expected though not necessarily using that much CPU. Kill it and it should be automatically restarted. The zip processes don't appear to be Datalab itself. Do any of your notebooks use zip, perhaps to compress/uncompress the images you mentioned? – Chris Meyers Jan 08 '18 at 17:59
  • Chris thanks much. Yes: If I kill node, it comes right back and goes to 300% on 4vCPU. pkill on all the hungry zip processes (up to 20 sometimes) does end them. – 1000 Jan 09 '18 at 19:59
  • Where I get into trouble with zip is here: pip3 install --upgrade h5py tensorflow google-cloud-storage. – 1000 Jan 09 '18 at 20:00
  • Which adds a number of long-running, high-cpu zip processes. I'm moving work to command line, but have a feeling datalab is not working as it should. – 1000 Jan 09 '18 at 20:01
  • Can you get the invoke path for the zip process? This might help point out the reason it was executed. – yelsayed Jan 12 '18 at 08:31

1 Answers1

0

Those process seems to come from disk backups performed in background with file listeners. I've been dealing with this issue when adding a bunch of small datasets and could solve it by disabling backups on datalab creation.

https://cloud.google.com/datalab/docs/reference/command-line/create

--no-backups