Highest Voted 'huggingface-tokenizers' Questions

0

votes

2 answers

How to convert word to numerics using huggingface or spacy or any python based workflow

I have lot of text which has the counting in words as well in different languages (different datasets but one data has one language so no mixing of language). like I have one apple I have two kids and I want it to convert as I have 1 apple I have…

asked Jul 05 '21 at 14:39

ML85

709
7
19

0

votes

1 answer

Is it possible to see all the token rankings for masked language modelling?

I was just wondering whether it would be possible to see all the predicted tokens for masked language modelling? Specifically, all the tokens with a low probability. For example, consider this masked language model: unmasker("I am feeling …

python nlp bert-language-model huggingface-transformers huggingface-tokenizers

asked Jun 30 '21 at 16:53

user14946125

0

votes

1 answer

Unable to push pre-trained model from Google colab to Huggingface for hosting Bot

I have trained a Chatbot model in Google collab and when pushing to Huggingface, it doesn't push and the notebook keeps executing and doesn't push, the size is around 500MB !sudo apt-get install git-lfs !git config --global user.email "MY…

git google-colaboratory pre-trained-model huggingface-tokenizers

asked Jun 26 '21 at 05:27

coldfire85

1

0

votes

2 answers

Problem with batch_encode_plus method of tokenizer

I am encountering a strange issue in the batch_encode_plus method of the tokenizers. I have recently switched from transformer version 3.3.0 to 4.5.1. (I am creating my databunch for NER). I have 2 sentences whom I need to encode, and I have a case…

python pytorch huggingface-transformers huggingface-tokenizers huggingface-datasets

asked Jun 24 '21 at 09:26

Anurag Sharma

4,839
13
59
101

0

votes

1 answer

AttributeError: type object 'Wav2Vec2ForCTC' has no attribute 'from_pretrained'

I am trying to fine tune Wav2Vec2 model for medical vocabulary. When I try to run the following code on my VS Code Jupyter notebook, I am getting an error, but when I run the same thing on Google Colab, it works fine. from transformers import…

python visual-studio-code jupyter-notebook huggingface-transformers huggingface-tokenizers

asked Jun 16 '21 at 16:47

Ayush Mehta

3
4

0

votes

1 answer

DataCollatorForMultipleChoice gives KeyError: 'labels' in trainer.train

I am working on multiple-choice QA. I am using the official notebook of huggingface/transformers which is implemented for SWAG dataset. I want to use it for other multiple-choice datasets. Therefore, I add some modifications related to dataset. all…

pytorch huggingface-transformers huggingface-tokenizers pytorch-dataloader

asked May 17 '21 at 23:57

programming123

79
1
7

0

votes

1 answer

Hugging face tokenizer cannot load files properly

I am trying to train a translation model from sratch using HuggingFace's BartModel architecture. I am using a ByteLevelBPETokenizer to tokenize things. The issue that I am facing is that when I save the tokenizer after training it is not loaded…

python nlp huggingface-tokenizers

asked Apr 27 '21 at 12:56

Vaibhav Agrawal

37
7

0

votes

0 answers

I'm facing BrokenPipeError when I'm trying to run sentiment analysis with hugging face

I'm facing BrokenPipeError when I'm trying to run sentiment analysis with hugging face. It's returning [Error No] 32 Broken Pipe. Link with total code 'https://colab.research.google.com/drive/1wBXKa-gkbSPPk-o7XdwixcGk7gSHRMas?usp=sharing' The code…

python deep-learning huggingface-transformers broken-pipe huggingface-tokenizers

asked Mar 05 '21 at 07:06

Nithin Reddy

580
2
8
18

0

votes

1 answer

TypeError: Can't convert re.compile('[A-Z]+') (re.Pattern) to Union[str, tokenizers.Regex]

I'm having issues applying a Regex expression to a Split() operation found in the HuggingFace Library. The library requests the following input for Split(). pattern (str or Regex) – A pattern used to split the string. Usually a string or a…

python python-3.x pytorch huggingface-tokenizers

asked Feb 26 '21 at 19:08

Jamie Dimon

467
4
16

0

votes

2 answers

In HuggingFace tokenizers: how can I split a sequence simply on spaces?

I am using DistilBertTokenizer tokenizer from HuggingFace. I would like to tokenize my text by simple splitting it on space: ["Don't", "you", "love", "", "Transformers?", "We", "sure", "do."] instead of the default behavior, which is like…

split tokenize huggingface-transformers huggingface-tokenizers

asked Feb 05 '21 at 13:51

Taras Kucherenko

103
3
10

0

votes

1 answer

Encoding error: Train BERT from scratch in Vietnamese language

I follow this tutorial How to train a new language model from scratch using Transformers and Tokenizers. In Section 2. Train a tokenizer, after training by my own Vietnamese text data, I look at the .vocab file generated, all the tokens become like…

encoding tokenize bert-language-model huggingface-transformers huggingface-tokenizers

asked Feb 04 '21 at 07:41

save_ole

300
1
3
10

0

votes

2 answers

TFGPT2LMHeadModel unknown location

I have been playing around with tensorflow (CPU), and some language model'ing - and it have been a blast so far - everything working great. But after watching my old CPU slowly getting killed from all the model-training - i decided it was time to …

python tensorflow importerror huggingface-transformers huggingface-tokenizers

asked Jan 29 '21 at 20:37

Magnus V.

43
4

0

votes

0 answers

Is there a way to use GPU instead of CPU for BERT tokenization?

I'm using a BERT tokenizer over a large dataset of sentences (2.3M lines, 6.53bn words): #creating a BERT tokenizer tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', …

pytorch bert-language-model huggingface-transformers huggingface-tokenizers

asked Jan 23 '21 at 09:43

Vincent Teyssier

2,146
5
25
62

0

votes

1 answer

Translating using pre-trained hugging face transformers not working

I have a situation where I am trying to using the pre-trained hugging-face models to translate a pandas column of text from Dutch to English. My input is simple: Dutch_text Hallo, het gaat goed Hallo, ik ben niet in orde Stackoverflow…

python-3.x nlp translation huggingface-transformers huggingface-tokenizers

asked Dec 28 '20 at 21:05

Django0602

797
7
26

0

votes

1 answer

I want to use "grouped_entities" in the huggingface pipeline for ner task, how to do that?

I want to use "grouped_entities" in the huggingface pipeline for ner task. However having issues doing that. I do look the following link on git but this did not help: https://github.com/huggingface/transformers/pull/4987

huggingface-transformers huggingface-tokenizers

asked Dec 18 '20 at 07:18

Abhishek Bisht

138
1
10

Questions tagged [huggingface-tokenizers]