-1

I want to download a pretrained a model and fine tune the model with my own data. I have downloaded a bert-large-NER model artifacts from hugging face,I have listed the contents below . being new to this, I want to know what files or artifacts do i need and from the looks of it the pytorch_model.bin is the trained model, but what are these others file and their purpose like tokenizer files and vocab.txt ....

config.json
pytorch_model.bin
special_tokens_map.json
tokenizer_config.json
vocab.txt
kyagu
  • 155
  • 2
  • 11

1 Answers1

1

These different files are the metadata of your model and the tokenizer that you are using (when you serialize your model this is the output). To fine tune a pre-trained model from the HF Hub you can either use PyTorch or TF or also the Trainer class where you don't have to write your own custom training code. Ex:

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

Reference the official docs here as well for understanding how to tune a pre-trained model end to end: https://huggingface.co/docs/transformers/training.

Ram Vegiraju
  • 399
  • 2
  • 5