0

I've struggling to find where a memory leak occurs in this file. This file is exported as an Event Listener. For context, I have 92 shards (meaning 92 of these listeners) running. I import the model from outside of this file so it's only loaded once per shard occurrence (stable 75 tensors in memory). However, after a few minutes, all the RAM on my computer is consumed (the function inside the file is called a dozen or so times per second). Have I overlooked any place which may cause this memory leak?

const use = require(`@tensorflow-models/universal-sentence-encoder`);
const tf = require(`@tensorflow/tfjs-node`);

const run = async (input, model) => {

    const useObj = await use.load();
    const encodings = [ await useObj.tokenizer.encode(input) ];

    const indicesArr = encodings.map(function (arr, i) { return arr.map(function (d, index) { return [i, index]; }); });
    var flattenedIndicesArr = [];
    for (i = 0; i < indicesArr.length; i++) {
      flattenedIndicesArr = flattenedIndicesArr.concat(indicesArr[i]);
    }

    const indices = tf.tensor2d(flattenedIndicesArr, [flattenedIndicesArr.length, 2], 'int32')
    const value = tf.tensor1d(tf.util.flatten([ encodings ]), 'int32')

    const prediction = await model.executeAsync({ Placeholder_1: indices, Placeholder: value });
    const classes = [ 'Identity Attack', 'Insult', 'Obscene', 'Severe Toxicity', 'Sexual Explicit', 'Threat', 'Toxicity' ]
    let finArr = [];
    let finMsg = `Input: ${input}, `;

    for (i = 0; i < prediction.length; i++) {
      const sorted = tf.topk(prediction[i], 2);
      const predictions = [ sorted.values.arraySync(), sorted.indices.arraySync() ];

      const percentage = (predictions[0][0][0]*100).toFixed(2);
      if (predictions[1][0][0] == 1) {
        finArr.push(`${classes[i]} (${percentage}%)`);
      }
      tf.dispose([ sorted, predictions ]);
    }
    for (i = 0; i < finArr.length; i++) {
      finMsg+=`${finArr[i]}, `;
    }

    tf.dispose([ prediction, indices, value, useObj ]);

    console.log(finMsg);
    console.log(tf.memory());
};

const main = async (message, client, Discord, model) => {
  if (message.author.bot) return;

  const input = message.content;
  await run(input, model);

};

module.exports = {
  event: 'messageCreate',
  run: async (message, client, Discord, model) => {

    await main(message, client, Discord, model);

  },
};
Nic Cheng
  • 47
  • 5

1 Answers1

0

to start with, you say this runs multiple times - so why are you loading model again and again? and disposing model is tricky, big chance that's part of your memory leak.

move const useObj = await use.load() outside of run loop and don't dispose it until you're done with all of the runs.

Vladimir Mandic
  • 813
  • 5
  • 11
  • I need to load the model once for each Shard, given the nature of Discord's Sharder. Even after moving `const useObj = await use.load()` outside of the run loop, I'm still experiencing high memory usage (30gb+ RAM) (a memory leak). This run loop executes around 1-2 dozen times per second because of the amount of messages this program processes. – Nic Cheng Jun 01 '22 at 00:45
  • if number of tensors is stable and doesnt increase on each run (can you confirm?), then its most likely either tensorflow.so not deallocating unused memory (which is btw a default on gpu runs and memory only gets deallocated when cuda is unloaded, unless specifically overriden) or node's js engine not performing garbage collection fast enough - which is quite possible (in which case you can expose node's gc (by default its hidden) and do an explicit call to it). – Vladimir Mandic Jun 01 '22 at 16:25
  • I can confirm that the number of tensors is stable, throughout all the executions. I believe those tensors are from the models that were preloaded. Is it possible that tensorflow.js or my other dependencies being used constantly results in the problem? And is there any part of that code that indicates the existence of a memory leak? Or is it all a matter of exposing node.js' GC and performing more-frequent, explicit cals to it? – Nic Cheng Jun 02 '22 at 04:43
  • js engine doesn't like to perform gc while its busy, it does so on interval bases and likes to skip if execution queue is deep. so manually exposing and triggering gc might help. but there is another option that excessive memory without deallocation happens inside tensorflow.so (which is used by tfjs-node) and js gc will then have no effect at all. there are some env variables which tune memory allocator of tensorflow.so, but they are primarily dealing with gpu memory allocation, not heap. – Vladimir Mandic Jun 03 '22 at 15:27