Question
According to the official documentation, the Trainer
class "provides an API for feature-complete training in PyTorch for most standard use cases".
However, when I try to actually use Trainer
in practice, I get the following error message that seems to suggest that TensorFlow is currently being used under the hood.
tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
So which one is it? Does the HuggingFace transformers library use PyTorch or TensorFlow for their internal implementation of Trainer
? And is it possible to switch to only using PyTorch? I can't seem to find a relevant parameter in TrainingArguments
.
Why does my script keep printing out TensorFlow related errors? Shouldn't Trainer
be using PyTorch only?
Source code
from transformers import GPT2Tokenizer
from transformers import GPT2LMHeadModel
from transformers import TextDataset
from transformers import DataCollatorForLanguageModeling
from transformers import Trainer
from transformers import TrainingArguments
import torch
# Load the GPT-2 tokenizer and LM head model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
lmhead_model = GPT2LMHeadModel.from_pretrained('gpt2')
# Load the training dataset and divide blocksize
train_dataset = TextDataset(
tokenizer=tokenizer,
file_path='./datasets/tinyshakespeare.txt',
block_size=64
)
# Create a data collator for preprocessing batches
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False
)
# Defining the training arguments
training_args = TrainingArguments(
output_dir='./models/tinyshakespeare', # output directory for checkpoints
overwrite_output_dir=True, # overwrite any existing content
per_device_train_batch_size=4, # sample batch size for training
dataloader_num_workers=1, # number of workers for dataloader
max_steps=100, # maximum number of training steps
save_steps=50, # after # steps checkpoints are saved
save_total_limit=5, # maximum number of checkpoints to save
prediction_loss_only=True, # only compute loss during prediction
learning_rate=3e-4, # learning rate
fp16=False, # use 16-bit (mixed) precision
optim='adamw_torch', # define the optimizer for training
lr_scheduler_type='linear', # define the learning rate scheduler
logging_steps=5, # after # steps logs are printed
report_to='none', # report to wandb, tensorboard, etc.
)
if __name__ == '__main__':
torch.multiprocessing.freeze_support()
trainer = Trainer(
model=lmhead_model,
args=training_args,
data_collator=data_collator,
train_dataset=train_dataset,
)
trainer.train()