Questions tagged [huggingface-tokenizers]

Use this tag for questions related to the tokenizers project from Hugging Face. GitHub: https://github.com/huggingface/tokenizers

451 questions
0
votes
0 answers

Importing Simple Transformer

Facing this error when I am trying to import simpletransformers. from simpletransformers.classification import ClassificationModel, ClassificationArgs Error: cannot import name 'Unigram' from 'tokenizers.models'…
0
votes
1 answer

ValueError: logits and labels must have the same shape ((1, 21) vs (21, 1))

I am trying to reproduce this example using huggingface TFBertModel to do a classification task. My model is almost the same of the example, but I'm performing multilabel classification. For this reason, I've performed the binarization of my labels…
0
votes
0 answers

how to get Single text prediction from the CustomisedBERT Classification + PyTorch NLP model with/without DataLoader

I have used BERT with HuggingFace and PyTorch and used DataLoader, Serializer for Training & Evaluation. Below is the code for that: ! pip install transformers==3.5.1 from transformers import AutoModel, BertTokenizerFast bert =…
0
votes
0 answers

Increase speed Huggingface tokenizer ouput

I need to get the last layer of embeddings from a BERT model using HuggingFace. The following code works, however is extremely slow, how can I increase the speed? This is a toy example, my real data is made of thousands of examples with long…
0
votes
0 answers

Pytorch + BERT+ batch_encode_plus() Code running fine in Colab but producing problems with Kaggle in mismatch input shapes

I tried to use a Google Colab initialised Notebook for Kaggle and found a strange behaviour as it gave me something like: 16 # text2tensor ---> 17 train_seq,train_mask,train_y = textToTensor(train_text,train_labels,pad_len) 18 …
0
votes
0 answers

BERT zero layer fixed word embeddings

I want to do an experiment with bert zero-layer vectors (input vectors), which I understand are of dimension 128. I can not find where I can get a file with the tokens and their vectors. Is there such a thing? Is there a file in the Glove/word2vec…
0
votes
0 answers

How to download the pretrained dataset of huggingface RagRetriever to a custom directory

I'm playing with a RAG example from facebook (huggingface) https://huggingface.co/facebook/rag-token-nq#usage. Here a very nice explanation of it:…
0
votes
1 answer

How to make byte level tokenizer not split the token?

I have the text with custom tokens, like: and I am trying to prepare a byte level tokenizer that won't split them: tokenizer.pre_tokenizer = ByteLevel() tokenizer.pre_tokenizer.pre_tokenize("") [('Ġ<', (0, 2)), ('adjective',…
artona
  • 1,086
  • 8
  • 13
0
votes
0 answers

"ValueError: You have to specify either input_ids or inputs_embeds" when using Trainer

I am getting "ValueError: You have to specify either input_ids or inputs_embeds" from a seemingly straightforward training example: Iteration: 0%| …
Yevgeniy
  • 1,313
  • 2
  • 13
  • 26
0
votes
1 answer

HuggingFace Transformers: BertTokenizer changing characters

I have downloaded the Norwegian BERT-model from https://github.com/botxo/nordic_bert, and loaded it in using: import transformers as t model_class = t.BertModel tokenizer_class = t.BertTokenizer tokenizer =…
0
votes
1 answer

Getting started: Huggingface Model Cards

I just recently started looking into the huggingface transformer library. When I tried to get started using the model card code at e.g. community model from transformers import AutoTokenizer, AutoModel tokenizer =…
0
votes
1 answer

attention_mask is missing in the returned dict from tokenizer.encode_plus

I have a codebase which was working fine but today when I was trying to run, I observed that tokenizer.encode_plus stopped returning attention_mask. Is it removed in the latest release? Or, do I need to do something else? The following piece of code…
Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
-1
votes
1 answer

How to fine tune a model from hugging face?

I want to download a pretrained a model and fine tune the model with my own data. I have downloaded a bert-large-NER model artifacts from hugging face,I have listed the contents below . being new to this, I want to know what files or artifacts do i…
-1
votes
3 answers

Hugging Face: NameError: name 'sentences' is not defined

I am following this tutorial here: https://huggingface.co/transformers/training.html - though, I am coming across an error, and I think the tutorial is missing an import, but i do not know which. These are my current imports: # Transformers…
-1
votes
2 answers

How do I prevent a lack of VRAM halfway through training a Huggingface Transformers (Pegasus) model?

I'm taking a pre-trained pegasus model through Huggingface transformers, (specifically, google/pegasus-cnn_dailymail, and I'm using Huggingface transformers through Pytorch) and I want to finetune it on my own data. This is however quite a large…
Lara
  • 2,594
  • 4
  • 24
  • 36
1 2 3
29
30