When I load the model I use device_map to use cuda:1 still it seems that the model and training are on different cores. How should I properly do this?
Code running at Tesla T4 below:
# load the base model in 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=bnb_config,
device_map={"": 1},
trust_remote_code=True,
use_auth_token=True,
)
base_model.config.use_cache = False
tokenizer = AutoTokenizer.from_pretrained(base_model_name, use_auth_token=True)
# add LoRA layers on top of the quantized base model
peft_config = LoraConfig(
r=16,
lora_alpha=64,
lora_dropout=0.1,
target_modules=["q_proj", "v_proj"],
bias="none",
task_type="CAUSAL_LM",
)
trainer = SFTTrainer(
model=base_model,
train_dataset=dataset,
peft_config=peft_config,
packing=True,
max_seq_length=None,
dataset_text_field="text",
tokenizer=tokenizer,
args=training_args, # HF Trainer arguments
)
trainer.train()
Gives error:
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example device_map={'':torch.cuda.current_device()} you're training on.
I following this guide: https://huggingface.co/blog/dpo-trl