0

I am trying out Allen NLP pre-trained models for Q&A.

The online demo is here : https://demo.allennlp.org/reading-comprehension

I have created a python script to try out various models.

Here is the benchmark summary on my laptop

  • Macbook Pro (2017)
  • 2.9 Ghz Intel i7 quad-core
  • 16 G memory
Benchmark transformer-qa bidaf-model bidaf-elmo-model
loading time 31.6 seconds 1.6 seconds 13.8 seconds
questions
Who stars in The Matrix? 794 ms 62 ms 1,798 ms
where does polar bear live 2,211 ms 96 ms 7,125 ms
how much does a polar bear weigh 2,435 ms 98 ms 7,082 ms
what is lightning 1,361 ms 69 ms 3,173 ms
How many lightning bolts strike earth 1,019 ms 47 ms 2,885 ms

Looking at the output I can see all 3 models are providing good answers. I like the transformer-qa model but it takes a while (in the order of seconds) to predict.

Is there a way to speed up prediction times?

thanks!

Sujee Maniyam
  • 1,093
  • 1
  • 9
  • 15

1 Answers1

2

The transformer-qa model contains more parameters, and as such is expected to take longer. One way to speed up inference time is to use a GPU; the speedup may not be significant if you are running predict on one instance at a time, running on batches should help there.

akshitab
  • 266
  • 1
  • 2
  • Hello akshitab, Can you give me any reference link or example for how can i process a paragraph in batches? – Raj Gohel Jun 14 '21 at 11:31