0

I'm trying to wrap my head around training OpenAI's language models on new data sets. Is there anyone here with experience in that regard? My idea is to feed either GPT-2 or 3 (I do not have API access to 3 though) with a textbook, train it on it and be able to "discuss" the content of the book with the language model afterwards. I don't think I'd have to change any of the hyperparameters, I just need more data in the model.

Is it possible??

Thanks a lot for any (also conceptual) help!

Quantizer
  • 275
  • 3
  • 13
  • 1
    I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). – desertnaut May 28 '21 at 08:42
  • Please notice that SO is a site for *specific programming* questions, and not a discussion forum. – desertnaut May 28 '21 at 08:43

2 Answers2

2

Presently GPT-3 has no way to be finetuned as we can do with GPT-2, or GPT-Neo / Neo-X. This is because the model is kept on their server and requests has to be made via API. A Hackernews post says that finetuning GPT-3 is planned or in process of construction.

Having said that, OpenAI's GPT-3 provide Answer API which you could provide with context documents (up to 200 files/1GB). The API could then be used as a way for discussion with it.

EDIT: Open AI has recently introduced Fine Tuning beta. https://beta.openai.com/docs/guides/fine-tuning Thus it will be best answer to the question to follow through description on that link.

Agung Dewandaru
  • 198
  • 1
  • 7
  • 1
    Thanks, I will check 'answerAPI' out! – Quantizer May 30 '21 at 05:43
  • Hello, I am looking for fine-tuning the GPT-2 model for the question answering, or say "generative question answering". Meaning, I train the GPT-2 with a large corpus of data for some specific industry (say medical) and then I start asking questions. If possible, will you please direct me toward that? Thanks – Aayush Shah Mar 06 '23 at 12:18
1

You can definitely retrain GPT-2. Are you only looking to train it for language generation purposes or do you have a specific downstream task you would like to adapt the GPT-2?

Both these tasks are possible and not too difficult. If you want to train the model for language generation i.e have it generate text on a particular topic, you can train the model exactly as it was trained during the pre-training phase. This means training it on a next-token prediction task with a cross-entropy loss function. As long as you have a dataset, and decent compute power, this is not too hard to implement.

When you say, 'discuss' the content of the book, it seems to me that you are looking for a dialogue model/chatbot. Chatbots are trained in a different way and if you are indeed looking for a dialogue model, you can look at DialoGPT and other models. They can be trained to become task-oriented dialog agents.

Zain Sarwar
  • 1,226
  • 8
  • 10
  • Hello, I am looking for fine-tuning the GPT-2 model for the question answering, or say "generative question answering". Meaning, I train the GPT-2 with a large corpus of data for some specific industry (say medical) and then I start asking questions. If possible, will you please direct me toward that? Thanks. – Aayush Shah Mar 06 '23 at 12:17