14

I'm writing a script, which sometimes leaks tensors. This can happen in multiple cases, for example when I'm training a neural network, but the training crashes. In this case, the training is interrupted and will not correctly dispose the tensors. This results in a memory leak, which I'm trying to clean up by disposing unused tensors.

Example

In the snippet below, I'm training two (very simple) models. The first run will work and will result in no leaked tensors (number of tensors before training = number of tensors after training). The second time, I'm using an invalid reshape layer to force a crash during training. Therefore, an error is thrown and the tensors from the dataset (I guess?) will not be correctly disposed. The code is an example to show how tensors might be leaked.

async function train(shouldCrash) {
  console.log(`Training, shouldCrash=${shouldCrash}`);
  const dataset = tf.data.zip({ // setup data
    xs: tf.data.array([[1],[1]]),
    ys: tf.data.array([1]),
  }).batch(1);

  const model = tf.sequential({ // setup model
    layers: [
      tf.layers.dense({units: 1, inputShape: [1]}),
      tf.layers.reshape({targetShape: [(shouldCrash ? 2 : 1)]}), // use invalid shape when crashing
    ],
  });
  model.compile({ optimizer: 'sgd', loss: 'meanSquaredError' });
  console.log('  Tensors before:', tf.memory().numTensors);
  try {
    const history = await model.fitDataset(dataset, { epochs: 1 });
  } catch (err) {
    console.log(`    Error: ${err.message}`);
  }
  console.log('  Tensors after:', tf.memory().numTensors);
}

(async () => {
  await train(false); // normal training
  await train(true); // training with error
})();
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.1.2/dist/tf.min.js"></script>

Question

There is the tf.tidy, which helps me in some cases to dispose unused tensors, but it can only be used for synchronous function calls. Therefore, it cannot be used when calling await model.fitDataset(...).

Is there a way to dispose any unused tensors? Alternatively, is there a way to dispose all existing tensors on the page (without reloading it)?

Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105

2 Answers2

29

The way to clean any unused tensors in async code is to wrap the code that creates them between a startScope() and an endScope() call.

tf.engine().startScope()
// do your thing
tf.engine().endScope()
David
  • 334
  • 3
  • 3
  • 1
    I figured it out looking at the [ts.tidy()](https://github.com/tensorflow/tfjs-core/blob/master/tfjs-core/src/engine.ts#L392) method. TFjs uses it in their tests as well https://github.com/tensorflow/tfjs-core/blob/master/tfjs-core/src/jasmine_util.ts#L185 – David Jan 28 '20 at 21:34
  • Yes, this is great! it solved the memory issues I was having. Thanks! – Juan Jan 29 '20 at 20:17
  • Thanks Great !! , This did the wonders! It cleans all the tensors between start and end scope :) – nk911 Jul 27 '20 at 09:26
  • Is there any documentation for this API? – jameshfisher Feb 15 '21 at 19:01
  • The maintainers of TensorFlow.js recommend using `.dispose()` instead of the above approach: https://github.com/tensorflow/tfjs/issues/4685. The above approach can be problematic if you have multiple promises running together, or if you create promises in promises (which is common in async/await code): this situation could result in interleaved microtasks from different promises, which can result in the following execution: `startScope`, `startScope`, `endScope`, `endScope`. The first `endScope` could dispose of tensors which were going to be used in another promise. – Maxime Kjaer Mar 25 '21 at 18:06
4

As per the documentation, the function provided to tf.tidy "must not return a Promise". Internally, tf backend disposes all the tensors uses when fitting a model. That is why tf.fit should not be placed inside tf.tidy. To dispose the model crashed, one can call tf.dispose on the model.

It is true that there seems to be currently a memory leak, but a model crash during the definition of the model is a poor implementation. This should not happen in a proper scenario as one can test if the parameters given matches what should be the input to the layers. For instance reshaping a shape of 2 to 1 can be avoid before constructing the model to prevent the memory leak.

async function train(shouldCrash) {
  console.log(`Training, shouldCrash=${shouldCrash}`);
  const dataset = tf.data.zip({ // setup data
    xs: tf.data.array([[1],[1]]),
    ys: tf.data.array([1]),
  }).batch(1);

  const model = tf.sequential({ // setup model
    layers: [
      tf.layers.dense({units: 1, inputShape: [1]}),
      tf.layers.reshape({targetShape: [(shouldCrash ? 2 : 1)]}), // use invalid shape when crashing
    ],
  });
  model.compile({ optimizer: 'sgd', loss: 'meanSquaredError' });
  console.log('  Tensors before:', tf.memory().numTensors);
  try {
    const history = await model.fitDataset(dataset, { epochs: 1 });
  } catch (err) {
    console.log(`    Error: ${err.message}`);
  }
  
  console.log('  Tensors after:', tf.memory().numTensors);
  return model
}

(async () => {
  const m1 = await train(false); // normal training
   tf.dispose(m1)
  const m2 = await train(true); // training with error
  
  tf.dispose(m2)
  tf.disposeVariables() 
  console.log('Tensors afters:', tf.memory().numTensors);
   
})();
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.1.2/dist/tf.min.js"></script>
edkeveked
  • 17,989
  • 10
  • 55
  • 93
  • 1
    I'm aware that the code I've shown is "poor implementation", but as I've said it was just used to demonstrate the memory leak. The `tf.disposeVariables` function looks very useful and I didn't even know I could pass a model to `tf.dispose`. Thank you! :) – Thomas Dondorf Jun 10 '19 at 12:07