0

I have downloaded the model from Hugging Face using snapshot_download, e.g.,

from huggingface_hub import snapshot_download

snapshot_download(repo_id="facebook/nllb-200-distilled-600M", cache_dir="./")

And when I list the directory, I see:

ls ./models--facebook--nllb-200-distilled-600M/snapshots/bf317ec0a4a31fc9fa3da2ce08e86d3b6e4b18f1/

Output:

config.json@             README.md@                tokenizer_config.json@
generation_config.json@  sentencepiece.bpe.model@  tokenizer.json@
pytorch_model.bin@       special_tokens_map.json@

I can load the model locally, but I'll have to guess the snapshot hash, e.g.,

from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained(
    "./models--facebook--nllb-200-distilled-600M/snapshots/bf317ec0a4a31fc9fa3da2ce08e86d3b6e4b18f1/",
    local_files_only=True
)

That works, but how do I load the Hugging Face model without guessing the hash?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
alvas
  • 115,346
  • 109
  • 446
  • 738

1 Answers1

1

You can have a better directory management by making a separate directory instead of using a local one for the snapshot download, e.g.

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="facebook/nllb-200-distilled-600M", 
    cache_dir="./huggingface_mirror"
)

Then you can load the model using the cache_dir keyword argument:

from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained(
    "facebook/nllb-200-distilled-600M",  
    cache_dir="huggingface_mirror",
    local_files_only=True
)
alvas
  • 115,346
  • 109
  • 446
  • 738