Fine-tune a davinci model to be similar to InstructGPT

Question

I have a few-shot GPT-3 text-davinci-003 prompt that produces "pretty good" results, but I quickly run out of tokens per request for interesting use cases. I have a data set (n~20) which I'd like to train the model with more but there is no way to fine-tune these InstructGPT models, only base GPT models.

As I understand it I can either:

A: Find a way to harvest 10x more data (I don't see an easy option here)
or B: Find a way to fine-tune Davinci into something capable of simpler InstructGPT behaviours

(Please let me know if there's a third option. I've attempted to increase epochs from 4 to 10 but the quality is really nowhere near as good).

Is there any way to fine-tune Davinci up to the point where it can model some of the things Instruct does? I don't need full capabilities, but if I can make it narrowed down to my use case it would be ideal.

--

By the way there is a common misconception that fine-tuning a GPT-3 model on a base (davinci, ada, babbage, etc...) will train it on the latest, eg: text-davinci-003. This is not how GPT works and is explained by GPT blog posts and support posts: https://help.openai.com/en/articles/6819989-can-i-fine-tune-on-text-davinci-003

Please don't claim openai api fine_tunes.create -t "model_prepared.jsonl" -m "davinci" will create a model based on text-davinci-003, it is not true, it uses base davinci.

I am also interested in the question. I wanted to fine-tune text-davinci-003 on my chats to converge it to responding like me, which I assume would be of fairly good quality, but I ended up training davinciy model, and after 1000 samples, the quality was fairly bad. — Nikita Chernenko, Mar 01 '23 at 13:40

score 0 · Answer 1 · answered Mar 01 '23 at 13:48

0

Currently OpenAI is not providing any model other than plain ada, babbage, curie and davinci

So we can't currently fine-tune these text-davinci-003 or InstructGPT or Codex models.

The thing we can do is trying other methods like few-shot learning.

answered Mar 01 '23 at 13:48

Mustafa CAN

1
3

Chris Roberts · Answer 2 · 2023-04-09T01:15:38.303

I saw this and thought I'd share my experiences working on something like that.

First attempt: I saved a 1500-page PDF to text, and fed it in roughly 4000-character chunks to ChatGPT, advancing roughly 2000 characters at a time, and fed those chunks to ChatGPT with something like "You're building GPT-3 training data based on chunks of a PDF. Generate prompt/completion pairs for training based on this information. Only respond with valid JSON.\n[data]". It looked OK at first, but after 6 hours of processing and a lot of hand-holding for invalid json, I realized the results were generally terrible. Empty prompts and empty completions were all over the place, it wasn't going to work.

Second attempt: I wrote some code using iText7 to extract text from a 1500-page PDF, and then infer a hierarchy from it. Luckily this turned out to be mostly reasonably sized chunks for feeding to gpt-3.5-turbo. Then, out of that hierarchical text file, I built a jsonl file like {"prompt": "Book: Chapter: Section: Subsection: Paragraph", "completion": "[bottom-level text from that section]". I trained a Davinci model on that, 4 epochs, and started testing some actual questions. It was terrible. I got either a wall of semi-relevant text from the manual, or it just made stuff up and was completely wrong.

Third attempt: I took each of those pairs from the previous step, and fed it to gpt-3.5-turbo with each of these headers (no system message):

"Respond in the form {"prompt": "[question]", "completion", "[answer]"}\nCreate as many question/answer pairs relevant to this prompt/completion pair as you can.\n[original pair]"

and if I detected incomplete json at the end of the file...

"Respond in the form {"prompt": "[question]", "completion", "[answer]"}\nCreate as many question/answer pairs relevant to this prompt/completion pair as you can. Create the questions and answers in the reverse order they appear in the original text.\n[original pair]."

It took a little trivial code to adjust some of the responses to the json I actually wanted, de-dup the results, but the 8000+ new training prompts look FANTASTIC. I have to wait until next month to retrain on the new data, but I'm very optimistic.

The training is pretty expensive (relative to the monthly limits), but spamming ChatGPT to build the prompts was dirt cheap. I'm sure you can keep iterating on this concept (i.e. have it rephrase questions in different ways to generate the same answer) to do even better. Using GPT-4 to build the training set would probably be even better, but the usage is too restricted right now to quickly do a job of that size.

Anyway, I hope this helps someone out there get a raise. Good luck!

Fine-tune a davinci model to be similar to InstructGPT

2 Answers2