TensorFlow.js prediction time is difference between the first trial and followings

Question

I am testing to load the TensorFlow.js model and trying to measure how many milliseconds it takes to predict. For example, the first time, it takes about 300 milliseconds to predict value but the time is decreased to 13~20 milliseconds from the second trial. I am not calculating time from the model loading. I am calculating only the prediction value after the model is loaded.

Can anyone explain why it gets decreased time to predict value?

// Calling TensorFlow.js model
const MODEL_URL = 'https://xxxx-xxxx-xxxx.xxx.xxx-xxxx-x.xxxxxx.com/model.json'
let model;
let prediction;
export async function getModel(input){
  console.log("From helper function: Model is being retrieved from the server...")
  model = await tf.loadLayersModel(MODEL_URL);

  // measure prediction time
  var str_time = new Date().getTime(); 
  prediction = await model.predict(input)
  var elapsed = new Date().getTime() - str_time;
  console.log("Laoding Time for Tensorflow: " + elapsed)        
    
  console.log(prediction.arraySync())
  ...
}

score 0 · Answer 1 · answered Mar 04 '22 at 02:50

Usually the first prediction would take longer due to needing to load the model into memory from the API request, once thats done it would be cached and you would not need make the same API request again.

If you wanted to see the actual prediction time, repeat the process of timing the predictions many times(perhaps 1000) then get the 99th quantile value which will show what is the prediction time for 99% of the cases(you can alter the quantile value as well to 90 or 50).

TensorFlow.js prediction time is difference between the first trial and followings

1 Answers1