0

I've built a trainer and when I submit the job, the job starts and logs get populated. But none of my output to stdout ever appears in the log. I do get messages like "The TensorFlow library wasns't compiled to use AVX2 instructions..."

The entire job takes about 5 to 10 minutes on my laptop; I let it run for over an hour on the cloud server and still never saw any output (and the first line of output occurs almost immediately when I run it locally.)

I can run my job locally by invoking it directly, but I haven't been able to get it to run using the "gcloud local" command... when I do this, I get an error "No module named tensorflow"

Brian Hanechak
  • 2,293
  • 2
  • 8
  • 6
  • Hi Brain, can you share the project number and job id with us via cloudml-feedback@google.com please? Regarding to "No module named tensorflow", did you install TensorFlow locally? – Guoqing Xu Aug 30 '17 at 16:30
  • RE:local run @Brian Do you have tensorflow-gpu installed or vanilla tensorflow? – rhaertel80 Aug 31 '17 at 14:04

1 Answers1

1

The log message "The TensorFlow library wasn't compiled to use AVX2 instructions" indicates that log messages are flowing from TensorFlow to Cloud Logging. So most likely there is a problem with the way you have configured logging and as a result log messages aren't being correctly written to stderr/stdout.

This easiest way to debug this would be to create a simple example to try to reproduce this error.

I'd suggest creating a simply python program that does nothing but log a message and then submitting that to the service to see if a log message is printed.

Something like the following

import logging
import time
if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    # Output logs for 5 minutes. We do this for 5 minutes just to ensure
    # the job doesn't terminate before logs can be flushed.
    for i in range(30):
       logging.info("This is an info message.")
       logging.error("This is an error message.")
       time.sleep(10)

For the issue importing TensorFlow when running locally please take a look at this SO Question which has some suggestions on how to check the Python path used by gcloud and verifying that it includes TensorFlow.

Jeremy Lewi
  • 6,386
  • 6
  • 22
  • 37