Questions tagged [huggingface-transformers]

Transformers is a Python library that implements various transformer NLP models in PyTorch and Tensorflow.

transformers is a natural language processing (NLP) library that implements many state-of-the-art transformer models in Python using PyTorch and TensorFlow. It is created and maintained by Hugging Face. The library is available through package managers, and it is open-sourced on GitHub. The library was formerly known as pytorch-transformers and before that as pytorch-pretrained-bert.

2878 questions

votes

6 answers

Where does hugging face's transformers save models?

Running the below code downloads a model - does anyone know what folder it downloads it to? !pip install -q transformers from transformers import pipeline model = pipeline('fill-mask')

huggingface-transformers

asked May 14 '20 at 13:27

user3472360

1,337
1
16
29

votes

4 answers

How to change huggingface transformers default cache directory

The default cache directory is lack of disk capacity, I need change the configure of the default cache directory.

huggingface-transformers

asked Aug 08 '20 at 07:28

Ivan Lee

3,420
4
30
45

votes

6 answers

Load a pre-trained model from disk with Huggingface Transformers

From the documentation for from_pretrained, I understand I don't have to download the pretrained vectors every time, I can save them and load from disk with this syntax: - a path to a `directory` containing vocabulary files required by the…

huggingface-transformers

asked Sep 21 '20 at 23:23

Mittenchops

18,633
33
128
246

votes

5 answers

How to disable TOKENIZERS_PARALLELISM=(true | false) warning?

I use pytorch to train huggingface-transformers model, but every epoch, always output the warning: The current process just got forked. Disabling parallelism to avoid deadlocks... To disable this warning, please explicitly set…

python pytorch huggingface-transformers huggingface-tokenizers

asked Jul 02 '20 at 07:35

snowzjy

votes

5 answers

ValueError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] - Tokenizing BERT / Distilbert Error

def split_data(path): df = pd.read_csv(path) return train_test_split(df , test_size=0.1, random_state=100) train, test = split_data(DATA_DIR) train_texts, train_labels = train['text'].to_list(), train['sentiment'].to_list() test_texts,…

tokenize bert-language-model huggingface-transformers huggingface-tokenizers distilbert

asked Aug 21 '20 at 05:59

Raoof Naushad

votes

2 answers

what's difference between tokenizer.encode and tokenizer.encode_plus in Hugging Face

Here is an example of doing sequence classification using a model to determine if two sequences are paraphrases of each other. The two examples give two different results. Can you help me explain why tokenizer.encode and tokenizer.encode_plus give…

huggingface-transformers

asked May 10 '20 at 07:16

andy

1,951
5
16
30

votes

4 answers

Transformers v4.x: Convert slow tokenizer to fast tokenizer

I'm following the transformer's pretrained model xlm-roberta-large-xnli example from transformers import pipeline classifier = pipeline("zero-shot-classification", model="joeddav/xlm-roberta-large-xnli") and I get the…

python nlp huggingface-transformers huggingface-tokenizers

asked Dec 23 '20 at 22:44

Miguel Trejo

5,913
5
24
49

votes

1 answer

How to use 'collate_fn' with dataloaders?

I am trying to train a pretrained roberta model using 3 inputs, 3 input_masks and a label as tensors of my training dataset. I do this using the following code: from torch.utils.data import TensorDataset, DataLoader, RandomSampler,…

python pytorch huggingface-transformers dataloader

asked Dec 13 '20 at 18:23

Sam V

votes

7 answers

How to download model from huggingface?

https://huggingface.co/models For example, I want to download 'bert-base-uncased', but cann't find a 'Download' link. Please help. Or is it not downloadable?

huggingface-transformers transformer-model

asked May 19 '21 at 00:34

marlon

6,029
8
42
76

votes

3 answers

How to build semantic search for a given domain

There is a problem we are trying to solve where we want to do a semantic search on our set of data, i.e we have a domain-specific data (example: sentences talking about automobiles) Our data is just a bunch of sentences and what we want is to give a…

python elasticsearch nlp sentence-similarity huggingface-transformers

asked Feb 12 '20 at 11:06

Jickson

5,133
2
27
38

votes

5 answers

How to compare sentence similarities using embeddings from BERT

I am using the HuggingFace Transformers package to access pretrained models. As my use case needs functionality for both English and Arabic, I am using the bert-base-multilingual-cased pretrained model. I need to be able to compare the similarity of…

python vector nlp cosine-similarity huggingface-transformers

asked Mar 02 '20 at 16:20

KOB

4,084
9
44
88

votes

3 answers

Huggingface saving tokenizer

I am trying to save the tokenizer in huggingface so that I can load it later from a container where I don't need access to the internet. BASE_MODEL = "distilbert-base-multilingual-cased" tokenizer =…

huggingface-transformers huggingface-tokenizers

asked Oct 27 '20 at 08:20

sachinruk

9,571
12
55
86

votes

2 answers

Saving and reload huggingface fine-tuned transformer

I am trying to reload a fine-tuned DistilBertForTokenClassification model. I am using transformers 3.4.0 and pytorch version 1.6.0+cu101. After using the Trainer to train the downloaded model, I save the model with trainer.save_model() and in my…

python pytorch huggingface-transformers

asked Nov 03 '20 at 13:03

Nate

votes

3 answers

Add dense layer on top of Huggingface BERT model

I want to add a dense layer on top of the bare BERT Model transformer outputting raw hidden-states, and then fine tune the resulting model. Specifically, I am using this base model. This is what the model should do: Encode the sentence (a vector…

python python-3.x neural-network pytorch huggingface-transformers

asked Oct 01 '20 at 13:16

Riccardo Bucco

13,980
4
22
50

votes

2 answers

How to free GPU memory in PyTorch

I have a list of sentences I'm trying to calculate perplexity for, using several models using this code: from transformers import AutoModelForMaskedLM, AutoTokenizer import torch import numpy as np model_name = 'cointegrated/rubert-tiny' model =…

python memory pytorch huggingface-transformers

asked Dec 28 '21 at 15:13

Penguin

1,923
3
21
51

2 3

…

99 100 Next