The second call preprocesses the data (e.g., tokenization), whereas the first call is making a prediction on data that is already preprocessed. So, the time difference is likely due to the time it is taking to preprocess the raw data:
%%time
tst = predictor.preproc.preprocess_test(x_test)
# Wall time: 5.65 s
%%time
preds = learner.predict(val)
# Wall time: 10.5 s
%%time
preds = predictor.predict(x_test)
# Wall time: 16.1 s
When supplying a list of texts to predict
, you can also use a larger batch_size
that may also help increase speed (default is 32):
predictor.batch_size = 128
preds = predictor.predict(x_test)
Finally, if you're looking to make faster predictions in a deployment scenario, you can look at the ktrain FAQ, where it shows how to make quantized predictions and predictions with ONNX.