Here is all the "documentation" I could find https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification
I honestly don't see how you're supposed to know how to use this thing with the resources I found online unless you made the script yourself.
I want to finetune a RoBERTa on a classification task with my own (.json) dataset and my own checkpoint.
DATA_SET='books'
MODEL='RoBERTa_small_fr_huggingface' # I have sentencepiece.bpe.model and pytorch_model.bin what do I use?
MAX_SENTENCES= '8' #batch size
LR = '1e-5'
MAX_EPOCH= '5'
NUM_CLASSES= '2'
SEEDS=3
CUDA_VISIBLE_DEVICES=0
TASK= 'sst2' #??
DATA_PATH= 'data/cls-books-json/' # with test.json, train.json, valid.json
for SEED in range(SEEDS):
SAVE_DIR= 'checkpoints/'+TASK+'/'+DATA_SET+'/'+MODEL+'_ms'+str(MAX_SENTENCES)+'_lr'+str(LR)+'_me'+str(MAX_EPOCH)+'/'+str(SEED)
!(python3 libs/transformers/examples/pytorch/text-classification/run_glue.py \
--model_name_or_path $MODEL \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--output_dir /tmp/hf)
So far I get this error:
run_glue.py: error: argument --model_name_or_path: expected one argument
But I'm sure it's not the only problem.