Fine tune an LLM NOT on question/answer dataset

Question

Most of the material out there for tine tuning LLMs use a question/answer dataset for fine tuning. Problem is, that's not my use case. I would like to fine tune an LLM on domain knowledge which exists as a set of documents and that set can't really be reformulated as a list of questions and answers.

How I do this? Is it even possible? If so, where to start?

score 0 · Answer 1 · answered Aug 29 '23 at 19:38

FineTuning an LLM where you permanently change the weights of the layer/s can only we done using the Instruction dataset (prompt-> response) . You can train an LLM (AutoRegressive) from scratch using self-supervised techniques which Bloomberg did if you have 2 mil $ to spend.

Your best option is Retrieval Augmentation. Where, you add context to your model using a retrieval module (DPR), hoping that it generates a more coherent response.

Fine tune an LLM NOT on question/answer dataset

1 Answers1