3

Why does the runtime keep crashing on Google Colab.

I have a simple MLP code that runs on my machine. I tried running the same code on Colab but it crashes immediately after loading the data files.

The data files are around 3GB total. The CPU and the GPU memory for the Colab virtual machine are easily above that.

Then why does my program crash before it can even start training.

My Code:

def load_raw(name):
  return (np.load(name + '.npy', encoding='bytes'), np.load(name + '_labels.npy', encoding='bytes'))

class WSJ():
 def __init__(self):
    self.dev_set = None
    self.train_set = None
    self.test_set = None


@property
def dev(self):
    if self.dev_set is None:
        self.dev_set = load_raw('dev')
    return self.dev_set

@property

def train(self):
    if self.train_set is None:
        self.train_set = load_raw('train')
    return self.train_set


@property
def test(self):
    if self.test_set is None:
        self.test_set = (np.load('test.npy', encoding='bytes'), None)
    return self.test_set

def preprocess_data(self, trainX, trainY, k):
     # some form of preprocessing that pads and flattens the data into the format required        

    return trainX_padded, trainY, y_to_x_map







def main():

 global index
 padding = 3
 epochs = 1
 batch_size = 512
 lr = 0.1
 momentum = 0.9

 input_dim = 40 * ((2*padding) + 1)
 output_dim = 138

 neural_net = MLP(input_dim, output_dim) 
 !free -g

 print("Starting...")
 loader = WSJ()
 trainX, trainY = loader.train
 print("Training Data obtained...")
 !free -g
 trainX, trainY, y_to_x_map = loader.preprocess_data(trainX, trainY, k = padding)
 print("Training Data preprocessed...")

 !free -g


 devX, devY = loader.dev
 devX, devY, y_to_x_map_dev = loader.preprocess_data(devX, devY, k = padding)
 print("Development data preprocessed...")

 !free -g


 print("Scaling...")
 input_scaler = preprocessing.StandardScaler().fit(trainX)

 !free -g

 trainX = input_scaler.transform(trainX)
 devX = input_scaler.transform(devX)

It crashes immediately after after printing Scaling...

SiHa
  • 7,830
  • 13
  • 34
  • 43
user3828311
  • 907
  • 4
  • 11
  • 20
  • Can you post a minimal example the reproduces the error? – Jus Feb 16 '18 at 20:43
  • Added in the question – user3828311 Feb 16 '18 at 20:54
  • Please share an example notebook that can be executed. The example code above doesn't execute anything. – Bob Smith Feb 16 '18 at 21:39
  • https://drive.google.com/open?id=1Wnrsbg5DrJsub4-gWSbMWMoxbo4n0_2M – user3828311 Feb 16 '18 at 21:43
  • I suspect you're hitting memory limits -- https://github.com/scikit-learn/scikit-learn/issues/5651 seems to have some relevant info. – Craig Citro Feb 18 '18 at 02:00
  • Actually, looking at the output in the notebook, I think you've got an undefined reference at line 11 in https://colab.research.google.com/notebook#fileId=1Wnrsbg5DrJsub4-gWSbMWMoxbo4n0_2M&scrollTo=vivEoUkuGoi-. – Craig Citro Feb 18 '18 at 02:00
  • That’s what I’m unable to understand. My local machine has 8gb RAM and runs fine on it. 13gb ram on colab should not make it run out of memory. Does loading the data on the virtual machine also eat my ram for colab ?? – user3828311 Feb 18 '18 at 02:03
  • Thx for the example. So the error occurs in `sklearn`'s preprocessing? I suggest to update the minimal example to strip off all consecutive code and unnecessary code and include the link into your question for visibility. Further it could help to tag the question with `sklearn` instead of (I think) unrelated tags. – Jus Feb 19 '18 at 04:45
  • Hi, I removed the sklean processesing. Even then, the code runs for 1 epoch and then the runtime crashes – user3828311 Feb 19 '18 at 17:35

0 Answers0