Highest Voted 'huggingface-tokenizers' Questions

2

votes

0 answers

Your fast tokenizer does not have the necessary information to save the vocabulary for a slow tokenizer

I'm trying to fine tune a t5 model for paraphrasing Farsi sentences. I'm using this model as my base. My dataset is a paired sentence dataset which each row is a pair of paraphrased sentences. I want to fine tune the model on this dataset. The…

asked Nov 22 '22 at 09:06

Ali Ghasemi

61
2

2

votes

1 answer

KeyError: 'eval_loss' in Hugginface Trainer

I am trying to build a Question Answering Pipeline with the Hugginface framework but facing the KeyError: 'eval_loss' error. My goal is to train and save the best model at last and evaluate the validation test on the loaded model. My trainer…

python-3.x machine-learning pytorch huggingface-transformers huggingface-tokenizers

asked Oct 28 '22 at 18:30

Aaditya Ura

12,007
7
50
88

2

votes

1 answer

How to know if HuggingFace's pipeline text input exceeds 512 tokens

I've finetuned a Huggingface BERT model for Named Entity Recognition based on 'bert-base-uncased'. I perform inference like this: from transformers import pipeline ner_pipeline = pipeline('token-classification', model=model_folder,…

huggingface-transformers huggingface-tokenizers huggingface

asked Oct 10 '22 at 16:38

ClaudiaR

3,108
2
13
27

2

votes

1 answer

How to pass arguments to HuggingFace TokenClassificationPipeline's tokenizer

I've finetuned a Huggingface BERT model for Named Entity Recognition. Everything is working as it should. Now I've setup a pipeline for token classification in order to predict entities out the text I provide. Even this is working fine. I know that…

python huggingface-transformers named-entity-recognition huggingface-tokenizers huggingface

asked Sep 16 '22 at 13:32

ClaudiaR

3,108
2
13
27

2

votes

0 answers

Huggingface pre-trained model

I try to use the below code: from transformers import AutoTokenizer, AutoModel t = "ProsusAI/finbert" tokenizer = AutoTokenizer.from_pretrained(t) model = AutoModel.from_pretrained(t) The error: I think this error is due to the old version of…

huggingface-transformers bert-language-model huggingface-tokenizers

asked Sep 07 '22 at 09:57

Learner91

103
6

2

votes

1 answer

How to customize the positional embedding?

I am using the Transformer model from Hugging face for machine translation. However, my input data has relational information as shown below: I want to craft a graph like the like the following: ________ | | | \|/ He ended his meeting…

deep-learning huggingface-transformers transformer-model huggingface-tokenizers huggingface

asked Aug 04 '22 at 23:10

Exploring

2,493
11
56
97

2

votes

1 answer

Getting error while extracting key value pair using LayoutLMV2 model

I am trying to extract key value pair from scanned invoices document using LayoutLMV2 model but I am getting error. Installation guide. I am just trying to check how the model is predicting the key value pair from the document or do I need to fine…

python machine-learning artificial-intelligence huggingface-transformers huggingface-tokenizers

asked Aug 04 '22 at 07:34

Laxmi

21
8

2

votes

1 answer

Do weights of the [PAD] token have a function?

When looking at the weights of a transformer model, I noticed that the embedding weights for the padding token [PAD] are nonzero. I was wondering whether these weights have a function, since they are ignored in the multi-head attention layers. Would…

huggingface-transformers word-embedding transformer-model huggingface-tokenizers huggingface

asked Jul 28 '22 at 15:52

Bas Krahmer

489
5
11

2

votes

1 answer

How to go around truncating long sentences with Hugginface Tokenizers?

I am new to tokenizers. My understanding is that the truncate attribute just cuts the sentences. But I need the whole sentence for context. For example, my sentence is : "Ali bin Abbas'ın Kitab Kamilü-s Sina adlı eseri daha sonra 980 yılında nasıl…

nlp huggingface-transformers transformer-model huggingface-tokenizers

asked Jul 19 '22 at 17:13

canovichh

35
5

2

votes

1 answer

HuggingFace - Why does the T5 model shorten sentences?

I wanted to train the model for spell correction. I trained two models allegro/plt5-base with polish sentences and google/t5-v1_1-base with english sentences. Unfortunately, I don't know for what reason, but both models shorten the…

python huggingface-transformers transformer-model huggingface-tokenizers huggingface

asked Jul 06 '22 at 11:27

nietoperz21

303
3
12

2

votes

0 answers

how do I use ByteLevelBPETokenizer with UTF-8?

I am trying to apply BPE on a piece of text that is utf8 encoded. Here is the code: import io from tokenizers import ByteLevelBPETokenizer from tokenizers.decoders import ByteLevel # list of the paths of your txt files decoder = ByteLevel() paths…

huggingface-tokenizers

asked Jun 26 '22 at 13:27

kloop

4,537
13
42
66

2

votes

1 answer

Huggingface transformers padding vs pad_to_max_length

I'm running a code by using pad_to_max_length = True and everything works fine. Only I get a warning as follow: FutureWarning: The pad_to_max_length argument is deprecated and will be removed in a future version, use padding=True…

python nlp huggingface-transformers huggingface-tokenizers

asked Jun 23 '22 at 04:58

Peyman

3,097
5
33
56

2

votes

5 answers

Unable to install tokenizers in Mac M1

I installed the transformers in the Macbook Pro M1 Max Following this, I installed the tokenizers with pip install tokenizers It showed Collecting tokenizers Using cached tokenizers-0.12.1-cp39-cp39-macosx_12_0_arm64.whl Successfully installed…

python-3.x apple-m1 huggingface-tokenizers

asked Jun 16 '22 at 01:52

trialcritic

1,225
1
10
14

2

votes

1 answer

Calculate precision, recall, f1 score for custom dataset for multiclass classification Huggingface library

I am trying to do multiclass classification for the sentence pair task. I uploaded my custom dataset of train and test separately in the hugging face data set and trained my model and tested it and was trying to see the f1 score and accuracy. I…

python-3.x huggingface-transformers bert-language-model huggingface-tokenizers huggingface-datasets

asked May 24 '22 at 17:49

Alex Kujur

121
6

2

votes

1 answer

Huggingface Load_dataset() function throws "ValueError: Couldn't cast"

My goal is to train a classifier able to do sentiment analysis in Slovak language using loaded SlovakBert model and HuggingFace library. Code is executed on Google Colaboratory. My test dataset is read from this csv…

machine-learning nlp sentiment-analysis huggingface-tokenizers huggingface

asked May 22 '22 at 19:43

Sotel

23
1
5

Questions tagged [huggingface-tokenizers]