2

I am trying to generate a 20 token text using GPT-2 simple. It is taking me around 15 seconds to generate the sentence. AI Dungeon is taking around 4 seconds to generate the same size sentence.

Is there a way to fasten/optimize the GPT-2 text generation?

A-Tech
  • 806
  • 6
  • 22
Sap BH
  • 71
  • 1
  • 6

3 Answers3

0

I think they have quicker results because their program is better optimized and they have greater computing power. They pay a lot for server. As well, Ai Dungeon uses GPT-3 which might be just faster. I'm as well struggling with speed of GPT-2. Let me know if you figured anything. Cheers

0

Text generation models like GPT-2 are slow, and it is of course even worse with bigger models like GPT-J and GPT-NeoX.

If you want to speed up your text generation you have a couple of options:

  • Use a GPU. GPT-2 doesn't require too much VRAM so an entry level GPU will do. On a GPU, generating 20 tokens with GPT-2 shouldn't take more than 1 second.
  • Quantize your model and convert it to TensorRT. See this good tutorial: https://github.com/NVIDIA/TensorRT/tree/main/demo/HuggingFace/GPT2
  • Serve it through a dedicated inference server (like TorchServe or Triton Inference Server).

I actually wrote an article about how to speed up inference of transformer based models. You might find it helpful: how to speed up deep learning inference

Julien Salinas
  • 1,059
  • 1
  • 10
  • 23
0

You can use the OpenVINO optimized version of GPT-2 model. The demo can be found here. It should be much faster as it's heavily optimized.

dragon7
  • 1,057
  • 9
  • 23