Fine-tuning GPT-2/3 on new data

Question

I'm trying to wrap my head around training OpenAI's language models on new data sets. Is there anyone here with experience in that regard? My idea is to feed either GPT-2 or 3 (I do not have API access to 3 though) with a textbook, train it on it and be able to "discuss" the content of the book with the language model afterwards. I don't think I'd have to change any of the hyperparameters, I just need more data in the model.

Is it possible??

Thanks a lot for any (also conceptual) help!

I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). — desertnaut, May 28 '21 at 08:42
Please notice that SO is a site for *specific programming* questions, and not a discussion forum. — desertnaut, May 28 '21 at 08:43

Agung Dewandaru · Accepted Answer · 2021-07-14T14:00:00.217

2

Presently GPT-3 has no way to be finetuned as we can do with GPT-2, or GPT-Neo / Neo-X. This is because the model is kept on their server and requests has to be made via API. A Hackernews post says that finetuning GPT-3 is planned or in process of construction.

Having said that, OpenAI's GPT-3 provide Answer API which you could provide with context documents (up to 200 files/1GB). The API could then be used as a way for discussion with it.

EDIT: Open AI has recently introduced Fine Tuning beta. https://beta.openai.com/docs/guides/fine-tuning Thus it will be best answer to the question to follow through description on that link.

edited Jul 14 '21 at 14:00

answered May 28 '21 at 09:18

Agung Dewandaru

198
1
7

1

Thanks, I will check 'answerAPI' out! – Quantizer May 30 '21 at 05:43
Hello, I am looking for fine-tuning the GPT-2 model for the question answering, or say "generative question answering". Meaning, I train the GPT-2 with a large corpus of data for some specific industry (say medical) and then I start asking questions. If possible, will you please direct me toward that? Thanks – Aayush Shah Mar 06 '23 at 12:18

score 1 · Answer 2 · answered May 28 '21 at 08:46

You can definitely retrain GPT-2. Are you only looking to train it for language generation purposes or do you have a specific downstream task you would like to adapt the GPT-2?

Both these tasks are possible and not too difficult. If you want to train the model for language generation i.e have it generate text on a particular topic, you can train the model exactly as it was trained during the pre-training phase. This means training it on a next-token prediction task with a cross-entropy loss function. As long as you have a dataset, and decent compute power, this is not too hard to implement.

When you say, 'discuss' the content of the book, it seems to me that you are looking for a dialogue model/chatbot. Chatbots are trained in a different way and if you are indeed looking for a dialogue model, you can look at DialoGPT and other models. They can be trained to become task-oriented dialog agents.

Hello, I am looking for fine-tuning the GPT-2 model for the question answering, or say "generative question answering". Meaning, I train the GPT-2 with a large corpus of data for some specific industry (say medical) and then I start asking questions. If possible, will you please direct me toward that? Thanks. — Aayush Shah, Mar 06 '23 at 12:17

Fine-tuning GPT-2/3 on new data

2 Answers2