Fine-tune llama2 on cuda:1

Question

When I load the model I use device_map to use cuda:1 still it seems that the model and training are on different cores. How should I properly do this?

Code running at Tesla T4 below:

# load the base model in 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
    
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map={"": 1},
    trust_remote_code=True,
    use_auth_token=True,
)
base_model.config.use_cache = False
tokenizer = AutoTokenizer.from_pretrained(base_model_name, use_auth_token=True)
    
# add LoRA layers on top of the quantized base model
peft_config = LoraConfig(
    r=16,
    lora_alpha=64,
    lora_dropout=0.1,
    target_modules=["q_proj", "v_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)
    
trainer = SFTTrainer(
    model=base_model,
    train_dataset=dataset,
    peft_config=peft_config,
    packing=True,
    max_seq_length=None,
    dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_args,         # HF Trainer arguments
)
trainer.train()

Gives error:

ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()} you're training on.

I following this guide: https://huggingface.co/blog/dpo-trl

It is possible that the first core (therefore default core) is `cuda:0` and not `cuda:1`. So when you are inferring the device it takes `0` instead of `1`, hence the error. — Barbara Gendron, Aug 18 '23 at 14:17
The device map "auto" is not functioning correctly for me. Even when setting device_map={"": "auto"}, it attempts to use cuda:0, which has very little available memory. As a workaround, I try to explicitly force it to use cuda:1, but it still insists on using cuda:0, which is not usable for me. — user1564762, Aug 18 '23 at 18:18
I initially believed that using "balanced_low_0" as the device in the device_map would be ideal for my purposes. However, it still resulted in an error stating that the model is loaded on a different CUDA device. Unfortunately, I am unable to force the trainer to use cuda:1 as intended. — user1564762, Aug 18 '23 at 18:31

Fine-tune llama2 on cuda:1

0 Answers0