Use this tag for questions related to the tokenizers project from Hugging Face. GitHub: https://github.com/huggingface/tokenizers
Questions tagged [huggingface-tokenizers]
451 questions
0
votes
0 answers
Importing Simple Transformer
Facing this error when I am trying to import simpletransformers.
from simpletransformers.classification import ClassificationModel, ClassificationArgs
Error:
cannot import name 'Unigram' from 'tokenizers.models'…

SK Singh
- 153
- 1
- 1
- 14
0
votes
1 answer
ValueError: logits and labels must have the same shape ((1, 21) vs (21, 1))
I am trying to reproduce this example using huggingface TFBertModel to do a classification task.
My model is almost the same of the example, but I'm performing multilabel classification. For this reason, I've performed the binarization of my labels…

revy
- 647
- 2
- 10
- 29
0
votes
0 answers
how to get Single text prediction from the CustomisedBERT Classification + PyTorch NLP model with/without DataLoader
I have used BERT with HuggingFace and PyTorch and used DataLoader, Serializer for Training & Evaluation. Below is the code for that:
! pip install transformers==3.5.1
from transformers import AutoModel, BertTokenizerFast
bert =…

Deshwal
- 3,436
- 4
- 35
- 94
0
votes
0 answers
Increase speed Huggingface tokenizer ouput
I need to get the last layer of embeddings from a BERT model using HuggingFace. The following code works, however is extremely slow, how can I increase the speed?
This is a toy example, my real data is made of thousands of examples with long…

Ushuaia81
- 495
- 1
- 6
- 14
0
votes
0 answers
Pytorch + BERT+ batch_encode_plus() Code running fine in Colab but producing problems with Kaggle in mismatch input shapes
I tried to use a Google Colab initialised Notebook for Kaggle and found a strange behaviour as it gave me something like:
16 # text2tensor
---> 17 train_seq,train_mask,train_y = textToTensor(train_text,train_labels,pad_len)
18 …

Deshwal
- 3,436
- 4
- 35
- 94
0
votes
0 answers
BERT zero layer fixed word embeddings
I want to do an experiment with bert zero-layer vectors (input vectors), which I understand are of dimension 128.
I can not find where I can get a file with the tokens and their vectors.
Is there such a thing?
Is there a file in the Glove/word2vec…

CSBS
- 1
- 1
0
votes
0 answers
How to download the pretrained dataset of huggingface RagRetriever to a custom directory
I'm playing with a RAG example from facebook (huggingface) https://huggingface.co/facebook/rag-token-nq#usage.
Here a very nice explanation of it:…

JoseM LM
- 373
- 1
- 8
0
votes
1 answer
How to make byte level tokenizer not split the token?
I have the text with custom tokens, like: and I am trying to prepare a byte level tokenizer that won't split them:
tokenizer.pre_tokenizer = ByteLevel()
tokenizer.pre_tokenizer.pre_tokenize("")
[('Ġ<', (0, 2)), ('adjective',…

artona
- 1,086
- 8
- 13
0
votes
0 answers
"ValueError: You have to specify either input_ids or inputs_embeds" when using Trainer
I am getting "ValueError: You have to specify either input_ids or inputs_embeds" from a seemingly straightforward training example:
Iteration: 0%| …

Yevgeniy
- 1,313
- 2
- 13
- 26
0
votes
1 answer
HuggingFace Transformers: BertTokenizer changing characters
I have downloaded the Norwegian BERT-model from https://github.com/botxo/nordic_bert, and loaded it in using:
import transformers as t
model_class = t.BertModel
tokenizer_class = t.BertTokenizer
tokenizer =…

Christian Vennerød
- 21
- 4
0
votes
1 answer
Getting started: Huggingface Model Cards
I just recently started looking into the huggingface transformer library.
When I tried to get started using the model card code at e.g. community model
from transformers import AutoTokenizer, AutoModel
tokenizer =…

Lukas
- 61
- 7
0
votes
1 answer
attention_mask is missing in the returned dict from tokenizer.encode_plus
I have a codebase which was working fine but today when I was trying to run, I observed that tokenizer.encode_plus stopped returning attention_mask. Is it removed in the latest release? Or, do I need to do something else?
The following piece of code…

Wasi Ahmad
- 35,739
- 32
- 114
- 161
-1
votes
1 answer
How to fine tune a model from hugging face?
I want to download a pretrained a model and fine tune the model with my own data. I have downloaded a bert-large-NER model artifacts from hugging face,I have listed the contents below . being new to this, I want to know what files or artifacts do i…

kyagu
- 155
- 2
- 11
-1
votes
3 answers
Hugging Face: NameError: name 'sentences' is not defined
I am following this tutorial here: https://huggingface.co/transformers/training.html - though, I am coming across an error, and I think the tutorial is missing an import, but i do not know which.
These are my current imports:
# Transformers…
user16098918
-1
votes
2 answers
How do I prevent a lack of VRAM halfway through training a Huggingface Transformers (Pegasus) model?
I'm taking a pre-trained pegasus model through Huggingface transformers, (specifically, google/pegasus-cnn_dailymail, and I'm using Huggingface transformers through Pytorch) and I want to finetune it on my own data. This is however quite a large…

Lara
- 2,594
- 4
- 24
- 36