I'm building a repository of QLORA adapters that change the model's personality. The end vision is a hub of ready-to-go personality adapters.
I'm hitting a snag when training the QLORAs for Paul Graham's personality on top of a 4-bit quantized StableBeluga-7B. The model just doesn't seem to learn the style.
Any thoughts on how I can improve this?
Below are the details for the best training run so far (lowest eval loss, but still no signs of PG personality):
Data
- 3340 examples of PG passages, formatted as {"text": "### User:\n{generic instruction}\n\n### Assistant:\n{PG-style response}"}.
- Each examples is about 5 sentences taken from one of PG's essays.
Training
- optim="paged_adamw_8bit"
- learning_rate=2e-4
- per_device_train_batch_size=4
- gradient_accumulation_steps=4
- num_train_epochs=4
- fp16=True
- group_by_length=True
- load_best_model_at_end=True
- max_seq_length=512
Hardware
- x1 V100 through Google Colab Pro.
My min eval loss so far is 1.916546.