-2

I'm attempting to build a closed-domain language model specifically tailored to legal services, essentially emulating a standalone lawyer. My approach involves pretraining the LLaMA 2 7B model, focusing only on the legal domain, and then fine-tuning it with relevant Q&A or summarization datasets. I have access to the necessary hardware (4 x A100 80G GPUs) and a budget of $20,000, but I'm concerned about the feasibility of this project and the potential for optimizing the training process for efficiency. I would appreciate insights or suggestions on the best way to go about this, and whether my approach to pretraining over finetuning is sound.

Hey everyone, I'm working on building a closed-domain language model for 'legal services'—essentially a standalone lawyer. My plan involves pretraining the LLaMA 2 7B model (with or without qbit), and you might be wondering why I'm pretraining instead of finetuning. The reason is that I want the model to have knowledge only within the legal domain, rather than any other areas. Following that, I'm planning to fine-tune it using Q&A datasets or summarization datasets. I have access to 4 x A100 80G GPUs and a $20,000 credit limit. Do you have any thoughts on this approach or suggestions for optimizing training for speed? I'm quite skeptical about whether this can even be accomplished.

1 Answers1

0

The approach you're taking with pretraining and fine-tuning seems wise. To optimize your training, I suggest using JAX. It's a robust library that can significantly speed up your training process thanks to its XLA, JIT, automatic differentiation, and other powerful features. JAX might seem a little weird at first, but it'll supercharge your training process if used right and, of course, needed. Give it a try! Good luck!

Ivarr
  • 1