Highest Voted 'huggingface-tokenizers' Questions

1

vote

2 answers

AttributeError: 'BloomForCausalLM' object has no attribute 'encode'

I'm trying to do some basic text inference using the bloom model from transformers import AutoModelForCausalLM, AutoModel # checkpoint = "bigscience/bloomz-7b1-mt" checkpoint = "bigscience/bloom-1b7" tokenizer =…

asked Dec 12 '22 at 06:04

Tobi Akinyemi

804
1
8
24

1

vote

1 answer

Do huggingface translation models support separate vocabulary for source and target?

Every example I've looked at so far seems to use a shared vocabulary between source and target languages, and I'm wondering if that is a hard-coded constraint of the Huggingface models, or my misunderstanding, or I've just not looked in the right…

huggingface-transformers huggingface-tokenizers machine-translation

asked Dec 08 '22 at 18:45

Darren Cook

27,837
13
117
217

1

vote

0 answers

Training an Hugginface model without n_epochs

I would like to train from scratch a RobertaForMaskedLM in Hugginface. However I would like to not specify any stopping time, but to stop only when there is no more improvement in the training. There is a way to do that? I know that the n_epochs…

nlp huggingface-transformers bert-language-model huggingface-tokenizers huggingface

asked Dec 08 '22 at 11:18

Chiara

372
5
17

1

vote

1 answer

How do I know which parameters to use with a pretrained Tokenizer?

I must be missing something ... I want to use a pretrained model with HuggingFace: transformer_name = "Geotrend/distilbert-base-fr-cased" # Or whatever model model = AutoModelForSequenceClassification.from_pretrained(transformer_name,…

deep-learning huggingface-transformers huggingface-tokenizers

asked Dec 02 '22 at 14:32

Alexandre GAREL

53
6

1

vote

1 answer

ValueError: bytes must be in range(0, 256) while decoding input tensor using transformer AutoTokenizer (MT5ForConditionalGerneration Model)

Relevant Code : from transformers import ( AdamW, MT5ForConditionalGeneration, AutoTokenizer, get_linear_schedule_with_warmup ) tokenizer = AutoTokenizer.from_pretrained('google/byt5-small',…

python-3.x huggingface-transformers pytorch-lightning huggingface-tokenizers encoder-decoder

asked Nov 25 '22 at 05:03

iamabhaykmr

1,803
3
24
49

1

vote

0 answers

Python - Docker socket hang up after first successful API call, docker exits mid way through second call

Trying a python program, using hugging face transformers & faiss. I was able to use the API successfully while testing locally. But while testing the same inside docker, the api executes successfully the first time & the I get a Error : Socket hang…

python docker api huggingface-transformers huggingface-tokenizers

asked Nov 11 '22 at 04:39

Megha John

153
1
12

1

vote

1 answer

Huggingface tokenizer not able to load model after upgrading python to 3.10

I just updated Python to version 3.10.8. Note that I use JupyterLab. I had to re-install a lot of packages, but now I get an error when I try to load the tokenizer of an HuggingFace model This is my code: # Import libraries from transformers import…

python-3.x collections jupyter-notebook python-3.10 huggingface-tokenizers

asked Nov 09 '22 at 17:01

SilentCloud

1,677
3
9
28

1

vote

0 answers

Does checkpointing with torch.save fail with hugging face -- if not what is the right way to checkpoint and load a hugging face (HF) model?

Does torch.save work on hugging face models (I am using vit)? I assumed yes. My error: File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch/serialization.py", line 379, in save _save(obj, opened_zipfile,…

python pytorch huggingface-transformers huggingface-tokenizers huggingface

asked Oct 26 '22 at 18:56

Charlie Parker

5,884
57
198
323

1

vote

1 answer

Building wheel for tokenizers (pyproject.toml) did not run successfully - Python 3.9.9 - Windows 10

Yes there are several other questions like this but no solution provided I am trying to install and run this project https://github.com/xashru/punctuation-restoration I have cloned the github repository Installed rust from here downloading x64 :…

python rust huggingface-tokenizers

asked Oct 18 '22 at 22:13

Furkan Gözükara

22,964
77
205
342

1

vote

0 answers

Why I am getting tensor of NaN values in PyTorch Huggingface inference?

I am fine-tuning distil-bert model for 200k iterations. Once it saves the checkpoint file, I do the inference. However, my inference vector for any random text is Nan. An example output is below. Does anyone have any idea ? tensor([[[nan, nan, nan,…

pytorch huggingface-transformers huggingface-tokenizers huggingface

asked Oct 07 '22 at 04:35

Ramraj Chandradevan

141
2
10

1

vote

1 answer

Getting an error install a package on the Terminal to use Hugging Face In VS Cod

I am using the steps from the Hugging Face website (https://huggingface.co/docs/transformers/installation) in order to start using hugging face in Visual Studio Code and install all the transformers. I was on the last process, where I had to type…

tensorflow2.0 torch huggingface-transformers huggingface-tokenizers huggingface

asked Oct 04 '22 at 19:37

waleeed

35
7

1

vote

1 answer

How to get a loss from Huggingface's pipeline method in order to finetune a model?

I'm trying to use this model on huggingface for QA. The code for it is in the link: from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline model_name = "deepset/roberta-base-squad2" # a) Get predictions nlp =…

machine-learning pytorch huggingface-transformers huggingface-tokenizers

asked Oct 02 '22 at 17:18

Penguin

1,923
3
21
51

1

vote

1 answer

from transformers import BertTokenizer

I am trying to implement the following model from hugging face but not entirely sure how to feed the model the texts that I need to pass to do the classification. The documentation (https://huggingface.co/DaNLP/da-bert-tone-subjective-objective)…

python huggingface-transformers huggingface-tokenizers huggingface

asked Oct 01 '22 at 12:56

Bemz

129
1
16

1

vote

0 answers

Unexpected keyword argument 'unk_token'

When trying to load this tokenizer I am getting this error but I don't know why it can't take the ink_token strangely. Any ideas? tokenizer = tokenizers.SentencePieceUnigramTokenizer(unk_token="", eos_token="", pad_token="") ----> 1 tokenizer =…

huggingface-transformers huggingface-tokenizers

asked Sep 08 '22 at 15:03

Antoine23

79
1
5

1

vote

0 answers

How does Byte-pair Encoding handle equally frequent pairs?

Let's say we train BPE tokenizer on this string: D C B B A B C D C B A B C D As I understand it merges the most frequent pairs, but what will the algorithm merge here first? DC, BC, CD, BA, or AB? All occur 2 times in this dummy corpus. Seems like…

machine-learning nlp tokenize huggingface-tokenizers

asked Sep 07 '22 at 01:33

Nikolay Klimenko

11
1

Questions tagged [huggingface-tokenizers]