GPT-2

Generative Pre-trained Transformer 2 (GPT-2)
	GPT-2 completion using the Hugging Face Write With Transformer website, prompted with text from this Wikipedia article (All highlighted text after the initial prompt is machine-generated from the first suggested completion, without further editing.)
Original author(s)	OpenAI
Initial release	14 February 2019
Repository	https://github.com/openai/gpt-2
Predecessor	GPT-1
Successor	GPT-3
Type	Large language model; Generative pre-trained transformer;
License	MIT
Website	openai.com/blog/gpt-2-1-5b-release/

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on BookCorpus, a dataset of over 7,000 self-published fiction books from various genres, and trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

GPT-2 was created as a "direct scale-up" of GPT-1 with a ten-fold increase in both its parameter count and the size of its training dataset. It is a general-purpose learner and its ability to perform the various tasks was a consequence of its general ability to accurately predict the next item in a sequence, which enabled it to translate texts, answer questions about a topic from a text, summarize passages from a larger text, and generate text output on a level sometimes indistinguishable from that of humans, however it could become repetitive or nonsensical when generating long passages. It was superseded by GPT-3 and GPT-4 models, which are not open source anymore.

GPT-2 has, like its predecessor GPT-1 and its successors GPT-3 and GPT-4, a generative pre-trained transformer architecture, implementing a deep neural network, specifically a transformer model, which uses attention instead of older recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. This model allows for greatly increased parallelization, and outperforms previous benchmarks for RNN/CNN/LSTM-based models.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.