64

Running the below code downloads a model - does anyone know what folder it downloads it to?

!pip install -q transformers
from transformers import pipeline
model = pipeline('fill-mask')
user3472360
  • 1,337
  • 1
  • 16
  • 29

6 Answers6

84

Update 2023-05-02: The cache location has changed again, and is now ~/.cache/huggingface/hub/, as reported by @Victor Yan. Notably, the sub folders in the hub/ directory are also named similar to the cloned model path, instead of having a SHA hash, as in previous versions.


Update 2021-03-11: The cache location has now changed, and is located in ~/.cache/huggingface/transformers, as it is also detailed in the answer by @victorx.


This post should shed some light on it (plus some investigation of my own, since it is already a bit older).

As mentioned, the default location in a Linux system is ~/.cache/torch/transformers/ (I'm using transformers v 2.7, currently, but it is unlikely to change anytime soon.). The cryptic folder names in this directory seemingly correspond to the Amazon S3 hashes.

Also note that the pipeline tasks are just a "rerouting" to other models. To know which one you are currently loading, see here. For your specific model, pipeline(fill-mask) actually utilizes a distillroberta-base model.

dennlinger
  • 9,890
  • 1
  • 42
  • 63
  • How would I get the `vocab.txt` from this location? It doesn't seem to be a directory: ```-rw------- 1 root root 435778770 Jan 27 05:30 794538e7c825dc7be96d9fc3c73b79a9736da5f699fc50d31513dbca0740b349.f0d8b668347b3048f5b88e273fde3c3412366726bc99aa5935b7990944092fb1 ``` – information_interchange Feb 02 '21 at 19:33
  • This file is exactly the vocabulary in the form of a dictionary map (if you view it with something like `less` or `nano`, you can see it). – dennlinger Feb 02 '21 at 19:56
  • 4
    How do we save the model in a custom path? Say we want to dockerise the implementation - it would be nice to have everything in the same directory. Any idea how this can be done? – hkh Mar 15 '21 at 16:02
  • I think there are several resources. Firstly, Huggingface indeed provides pre-built dockers [here](https://hub.docker.com/r/huggingface/transformers-pytorch-gpu), where you could check how they do it. – dennlinger Mar 15 '21 at 18:36
  • 5
    @hkh I found the parameter, you can pass in `cache_dir`, like: `model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b", cache_dir="~/mycoolfolder")`. I had to figure this out to use a fast external NVMe because I was running out of space. – james-see Jul 03 '22 at 03:03
  • by using cache_dir=./disk2/myfolder, it still uses the default one to store the model and during the download my disk goes out of space. how should I do? – user3043636 Aug 02 '23 at 11:26
  • Please open this comment in a separate question if you have any issues unrelated to the original question. – dennlinger Aug 02 '23 at 12:17
18

As of Transformers version 4.3, the cache location has been changed.

The exact place is defined in this code section ​https://github.com/huggingface/transformers/blob/master/src/transformers/file_utils.py#L181-L187

On Linux, it is at ~/.cache/huggingface/transformers.

The file names there are basically SHA hashes of the original URLs from which the files are downloaded. The corresponding json files can help you figure out what are the original file names.

dataista
  • 3,187
  • 1
  • 16
  • 23
victorx
  • 3,267
  • 2
  • 25
  • 35
18

On windows 10, replace ~ with C:\Users\username or in cmd do cd /d "%HOMEDRIVE%%HOMEPATH%".

So full path will be: C:\Users\username\.cache\huggingface\transformers

Maverick Meerkat
  • 5,737
  • 3
  • 47
  • 66
15

As of transformers 4.22, the path appears to be (tested on CentOS):

~/.cache/huggingface/hub/
Victor Yan
  • 3,339
  • 2
  • 28
  • 28
2

On Windows C:\Users\USER\.cache\huggingface\hub

Murilo Maciel Curti
  • 2,677
  • 1
  • 21
  • 26
1
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="sentence-transformers/all-MiniLM-L6-v2", filename="config.json")
ls -lrth  ~/.cache/huggingface/hub/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/7dbbc90392e2f80f3d3c277d6e90027e55de9125/
total 4.0K
lrwxrwxrwx 1 alex alex 52 Jan 25 12:15 config.json -> ../../blobs/72b987fd805cfa2b58c4c8c952b274a11bfd5a00
lrwxrwxrwx 1 alex alex 76 Jan 25 12:15 pytorch_model.bin -> ../../blobs/c3a85f238711653950f6a79ece63eb0ea93d76f6a6284be04019c53733baf256
lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 vocab.txt -> ../../blobs/fb140275c155a9c7c5a3b3e0e77a9e839594a938
lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 special_tokens_map.json -> ../../blobs/e7b0375001f109a6b8873d756ad4f7bbb15fbaa5
lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 tokenizer_config.json -> ../../blobs/c79f2b6a0cea6f4b564fed1938984bace9d30ff0
Alex Punnen
  • 5,287
  • 3
  • 59
  • 71