Highest Voted 'huggingface-tokenizers' Questions

5

votes

2 answers

How to untokenize BERT tokens?

I have a sentence and I need to return the text corresponding to N BERT tokens to the left and right of a specific word. from transformers import BertTokenizer tz = BertTokenizer.from_pretrained("bert-base-cased") sentence = "The Natural Science…

asked Feb 16 '21 at 22:14

JayJay

173
1
6

5

votes

1 answer

On-the-fly tokenization with datasets, tokenizers, and torch Datasets and Dataloaders

I have a question regarding "on-the-fly" tokenization. This question was elicited by reading the "How to train a new language model from scratch using Transformers and Tokenizers" here. Towards the end there is this sentence: "If your dataset is…

huggingface-transformers huggingface-tokenizers gpt-2

asked Dec 05 '20 at 17:15

Pietro

415
6
16

5

votes

3 answers

Huggingface BERT Tokenizer add new token

I am using Huggingface BERT for an NLP task. My texts contain names of companies which are split up into subwords. tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased') tokenizer.encode_plus("Somespecialcompany") output: {'input_ids':…

bert-language-model huggingface-transformers huggingface-tokenizers

asked Nov 03 '20 at 19:29

Nui

101
1
6

5

votes

2 answers

How do I translate using HuggingFace from Chinese to English?

I want to translate from Chinese to English using HuggingFace's transformers using a pretrained "xlm-mlm-xnli15-1024" model. This tutorial shows how to do it from English to German. I tried following the tutorial but it doesn't detail how to…

nlp translation huggingface-transformers machine-translation huggingface-tokenizers

asked Jul 04 '20 at 12:16

wtwtwt

368
4
11

5

votes

3 answers

Hugging-Face Transformers: Loading model from path error

I am pretty new to Hugging-Face transformers. I am facing the following issue when I try to load xlm-roberta-base model from a given path: >> tokenizer = AutoTokenizer.from_pretrained(model_path) >> Traceback (most recent call last): File…

huggingface-transformers huggingface-tokenizers

asked Jun 29 '20 at 15:53

Spartan

51
1
3

5

votes

3 answers

Huggingface Summarization

I am practicing with Transformers to summarize text. Following the tutorial at : https://huggingface.co/transformers/usage.html#summarization from transformers import pipeline summarizer = pipeline("summarization") ARTICLE = """ New York (CNN)When…

huggingface-transformers huggingface-tokenizers

asked Jun 29 '20 at 05:53

xamlova

51
1
3

4

votes

1 answer

How to fine tune a Huggingface Seq2Seq model with a dataset from the hub?

I want to train the "flax-community/t5-large-wikisplit" model with the "dxiao/requirements-ner-id" dataset. (Just for some experiments) I think my general procedure is not correct, but I don't know how to go further. My Code: Load tokenizer and…

python nlp huggingface-transformers huggingface-tokenizers huggingface

asked Mar 27 '23 at 10:33

jonash_01

43
5

4

votes

2 answers

How to handle sequences longer than 512 tokens in layoutLMV3?

How to work with sequences longer than 512 tokens. I don't wanted to use truncates =True. But actually wanted to handle the longer sequences

transformer-model huggingface-tokenizers huggingface

asked Nov 02 '22 at 14:06

Jyoti yadav

108
6

4

votes

1 answer

Tokenizer.from_file() HUGGINFACE : Exception: data did not match any variant of untagged enum ModelWrapper

I am having issue loading a Tokenizer.from_file() BPE tokenizer. When I try I am encountering this error where the line 11743 is the last last one: Exception: data did not match any variant of untagged enum ModelWrapper at line 11743 column 3 I have…

json nlp huggingface-transformers huggingface-tokenizers huggingface

asked Nov 01 '22 at 16:34

Chiara

372
5
17

4

votes

1 answer

How to drop sentences that are too long in Huggingface?

I'm going through the Huggingface tutorial and it appears as the library has automatic truncation, to cut sentences that are too long, based on a max value, or other things. How can I remove sentences for the same reasoning (sentences are too long,…

python huggingface-transformers huggingface-tokenizers huggingface-datasets

asked May 26 '22 at 16:54

Penguin

1,923
3
21
51

4

votes

3 answers

How to apply max_length to truncate the token sequence from the left in a HuggingFace tokenizer?

In the HuggingFace tokenizer, applying the max_length argument specifies the length of the tokenized text. I believe it truncates the sequence to max_length-2 (if truncation=True) by cutting the excess tokens from the right. For the purposes of…

python pytorch huggingface-transformers bert-language-model huggingface-tokenizers

asked May 11 '22 at 13:52

Ondrej Sotolar

1,352
1
19
29

4

votes

2 answers

SpeechBrain: Cannot Load Pretrained Model from Local Path

I'm trying to load a pretrained SpeechBrain HuggingFace model from local files; I don't want it to call out to HuggingFace to download. However, unless I change the pretrained_path in hyperparams.yaml, it is still calling out to HuggingFace and…

speech-recognition speech-to-text torch huggingface-transformers huggingface-tokenizers

asked Feb 01 '22 at 05:32

Nat G

191
1
15

4

votes

0 answers

Internal RuntimeError when using a custom fine-tuned model

I tried to fine-tune this model I found on huggingface (https://github.com/flexudy-pipe/sentence-doctor) in order to make it more performant with french, however, I have a problem. I did use the train_any_t5_task.py file the author gave…

python machine-learning pytorch huggingface-transformers huggingface-tokenizers

asked Dec 25 '21 at 17:39

Santosh Passoubady

51
6

4

votes

1 answer

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation

I'm learning NLP following this sequence classification tutorial from HuggingFace https://huggingface.co/transformers/custom_datasets.html#sequence-classification-with-imdb-reviews The original code runs without problem. But when I tried to load a…

huggingface-transformers huggingface-tokenizers

asked Nov 04 '21 at 16:51

Rafael

1,761
1
14
21

4

votes

1 answer

How to download hugging face sentiment-analysis pipeline to use it offline?

How to download hugging face sentiment-analysis pipeline to use it offline? I'm unable to use hugging face sentiment analysis pipeline without internet. How to download that pipeline? The basic code for sentiment analysis using hugging face is from…

deep-learning nlp huggingface-transformers huggingface-tokenizers

asked Apr 01 '21 at 14:43

Nithin Reddy

580
2
8
18

Questions tagged [huggingface-tokenizers]