I am testing to load the TensorFlow.js model and trying to measure how many milliseconds it takes to predict. For example, the first time, it takes about 300 milliseconds to predict value but the time is decreased to 13~20 milliseconds from the second trial. I am not calculating time from the model loading. I am calculating only the prediction value after the model is loaded.
Can anyone explain why it gets decreased time to predict value?
// Calling TensorFlow.js model
const MODEL_URL = 'https://xxxx-xxxx-xxxx.xxx.xxx-xxxx-x.xxxxxx.com/model.json'
let model;
let prediction;
export async function getModel(input){
console.log("From helper function: Model is being retrieved from the server...")
model = await tf.loadLayersModel(MODEL_URL);
// measure prediction time
var str_time = new Date().getTime();
prediction = await model.predict(input)
var elapsed = new Date().getTime() - str_time;
console.log("Laoding Time for Tensorflow: " + elapsed)
console.log(prediction.arraySync())
...
}