Highest Voted 'huggingface-tokenizers' Questions

2

votes

1 answer

pip on Docker image cannot find Rust - even though Rust is installed

I'm trying to install some Python packages, namely tokenizers from huggingface transformers, which apparently needs Rust. So I am installing Rust on my Docker build: FROM nikolaik/python-nodejs USER pn WORKDIR /home/pn/app COPY . /home/pn/app/ RUN…

asked Apr 16 '22 at 17:15

lte__

7,175
25
74
131

2

votes

1 answer

405 : Client Error: Not Allowed for huggingface url

I am trying to follow the huggingface tutorial on finetuning models for summarization. All I'm trying is to load the t5 tokenizer. from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("t5-small") And I get the following…

huggingface-transformers http-status-code-405 huggingface-tokenizers

asked Mar 29 '22 at 02:17

Kiera.K

317
1
13

2

votes

0 answers

How to get tokens to words in BERT tokenizer

I have a list, using higgingface bert tokenizer I can get the mapping numerical representation. X = ['[CLS]', '[MASK]', 'love', 'this', '[SEP]'] tokens = tokenizer.convert_tokens_to_ids(X) toekns: [101, 103, 2293, 2023, 102] Is there any function…

nlp huggingface-transformers bert-language-model transformer-model huggingface-tokenizers

asked Mar 21 '22 at 04:08

kowser66

125
1
8

2

votes

2 answers

Getting an error even after using truncation for tokenizer while predicting (MLM) on bert using huggingface

I am using truncation=True in the tokenizer self.tokenizer = AutoTokenizer.from_pretrained(bert_model_str, truncation=True) self.pipeline = pipeline("fill-mask", model=self.model, tokenizer=self.tokenizer) however I am still getting multiple…

python python-3.x huggingface-transformers huggingface-tokenizers

asked Feb 23 '22 at 16:04

Coddy

549
4
18

2

votes

0 answers

How can I combine a Huggingface tokenizer and a BERT-based model in onnx?

Problem description: I have a model based on BERT, with a classifier layer on top. I want to export it to ONNX, but to avoid issues on the side of the 'user' of the onnx model, I want to export the entire pipeline, including tokenization, as a ONNX…

python pytorch huggingface-transformers onnx huggingface-tokenizers

asked Feb 08 '22 at 14:27

Kroshtan

637
5
17

2

votes

1 answer

Hugging face - Efficient tokenization of unknown token in GPT2

I am trying to train a dialog system using GPT2. For tokenization, I am using the following configuration for adding the special tokens. from transformers import ( AdamW, AutoConfig, AutoTokenizer, PreTrainedModel, …

python nlp huggingface-transformers huggingface-tokenizers gpt-2

asked Jan 11 '22 at 19:35

Soumya Ranjan Sahoo

133
2
9

2

votes

1 answer

How to get a probability distribution over tokens in a huggingface model?

I'm following this tutorial on getting predictions over masked words. The reason I'm using this one is because it seems to be working with several masked word simultaneously while other approaches I tried could only take 1 masked word at a time. The…

python pytorch huggingface-transformers huggingface-tokenizers

asked Dec 10 '21 at 03:32

Penguin

1,923
3
21
51

2

votes

1 answer

Using huggingface library gives an error: KeyError: 'logits'

I'm new to the huggingface library and trying to run a model to do masked language ("fill-mask" task): from transformers import BertTokenizer, BertForMaskedLM import torch from transformers import pipeline, AutoTokenizer, AutoModel # Initialize MLM…

python pytorch huggingface-transformers huggingface-tokenizers

asked Dec 08 '21 at 16:19

Penguin

1,923
3
21
51

2

votes

1 answer

Mapping huggingface tokens to original input text

How can I map the tokens I get from huggingface DistilBertTokenizer to the positions of the input text? e.g. I have a new GPU -> ["i", "have", "a", "new", "gp", "##u"] -> [(0, 1), (2, 6), ...] I'm interested in this because suppose that I have some…

tokenize huggingface-transformers huggingface-tokenizers

asked Nov 25 '21 at 08:34

Hardian Lawi

588
5
22

2

votes

0 answers

Adding 'decoder_start_token_id' with SimpleTransformers

Training MBART in Seq2Seq with SimpleTransformers but getting an error I am not seeing with BART: TypeError: shift_tokens_right() missing 1 required positional argument: 'decoder_start_token_id' So far I've tried various combinations…

python huggingface-transformers seq2seq huggingface-tokenizers simpletransformers

asked Nov 04 '21 at 20:44

LeOverflow

301
1
2
16

2

votes

1 answer

Huggingface Tokenizer object is not callable

I am creating a deep learning code that embeds text into BERT based embedding. I am seeing unexpected issues in a code that was working fine before. Below is the snippet: sentences = ["person in red riding a motorcycle", "lady cutting cheese with…

huggingface-tokenizers

asked Nov 02 '21 at 23:30

amitgh

61
1
6

2

votes

0 answers

Is there a tokenizer that can find sentence boundaries and apply BPE at the same time?

There seem to be lots and lots of libraries out there that can find sentence boundaries. The reason I need to find these is to chunk up longer texts so I can send them to language models. This means once I have my chunks made up of complete…

nlp tokenize huggingface-transformers sentence huggingface-tokenizers

asked Oct 27 '21 at 11:50

rudolfovic

3,163
2
14
38

2

votes

1 answer

AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'to_tensor'

I'm fine-tuning a BERT model using Hugging Face, Keras, Tensorflow libraries. Since yesterday I'm getting this error running my code in Google Colab. The odd thing is that the code used to run without any problem and suddenly started to throw this…

python tensorflow google-colaboratory huggingface-transformers huggingface-tokenizers

asked Oct 14 '21 at 22:20

ipietri

21
1
3

2

votes

0 answers

How long does load_dataset take time in huggingface?

I want to pre-train a T5 model using huggingface. The first step is training the tokenizer with this code: import datasets from t5_tokenizer_model import SentencePieceUnigramTokenizer vocab_size = 32_000 input_sentence_size = None # Initialize a…

python-3.x load google-colaboratory huggingface-tokenizers huggingface-datasets

asked Oct 02 '21 at 08:50

Ahmad

8,811
11
76
141

2

votes

1 answer

Which loss function to use for training sparse multi-label text classification problem and class skewness/imbalance

I am training a sparse multi-label text classification problem using Hugging Face models which is one part of SMART REPLY System. The task which I am doing is mentioned below: I classify Customer Utterances as input to the model and classify to…

pytorch loss-function huggingface-transformers multilabel-classification huggingface-tokenizers

asked Sep 08 '21 at 07:22

MAC

1,345
2
30
60

Questions tagged [huggingface-tokenizers]