-3

I'm building a repository of QLORA adapters that change the model's personality. The end vision is a hub of ready-to-go personality adapters.

I'm hitting a snag when training the QLORAs for Paul Graham's personality on top of a 4-bit quantized StableBeluga-7B. The model just doesn't seem to learn the style.

Any thoughts on how I can improve this?

Below are the details for the best training run so far (lowest eval loss, but still no signs of PG personality):

Data

  • 3340 examples of PG passages, formatted as {"text": "### User:\n{generic instruction}\n\n### Assistant:\n{PG-style response}"}.
  • Each examples is about 5 sentences taken from one of PG's essays.

Training

  • optim="paged_adamw_8bit"
  • learning_rate=2e-4
  • per_device_train_batch_size=4
  • gradient_accumulation_steps=4
  • num_train_epochs=4
  • fp16=True
  • group_by_length=True
  • load_best_model_at_end=True
  • max_seq_length=512

Hardware

  • x1 V100 through Google Colab Pro.

My min eval loss so far is 1.916546.

desertnaut
  • 57,590
  • 26
  • 140
  • 166

0 Answers0