1

Most of the material out there for tine tuning LLMs use a question/answer dataset for fine tuning. Problem is, that's not my use case. I would like to fine tune an LLM on domain knowledge which exists as a set of documents and that set can't really be reformulated as a list of questions and answers.

How I do this? Is it even possible? If so, where to start?

markalex
  • 8,623
  • 2
  • 7
  • 32
Demiurg
  • 1,597
  • 8
  • 26
  • 40

1 Answers1

0

FineTuning an LLM where you permanently change the weights of the layer/s can only we done using the Instruction dataset (prompt-> response) . You can train an LLM (AutoRegressive) from scratch using self-supervised techniques which Bloomberg did if you have 2 mil $ to spend.

Your best option is Retrieval Augmentation. Where, you add context to your model using a retrieval module (DPR), hoping that it generates a more coherent response.