Questions tagged [huggingface-datasets]

Use this tag for questions related to the datasets project from Hugging Face. [Project on GitHub][1] [1]: https://github.com/huggingface/datasets

221 questions
0
votes
1 answer

ValueError: Please pass `features` or at least one example when writing data

I'm new to huggingface and am working on a movie generation script. So far my code looks like this from transformers import GPT2Tokenizer, GPTNeoModel from datasets import load_dataset dataset =…
Ulto 4
  • 368
  • 4
  • 16
0
votes
1 answer

Key error when feeding the training corpus to the train_new_from_iterator method

I am following this tutorial here: https://github.com/huggingface/notebooks/blob/master/examples/tokenizer_training.ipynb So, using this code, I add my custom dataset: from datasets import load_dataset dataset = load_dataset('csv',…
0
votes
1 answer

Setting `remove_unused_columns=False` causes error in HuggingFace Trainer class

I am training a model using HuggingFace Trainer class. The following code does a decent job: !pip install datasets !pip install transformers from datasets import load_dataset from transformers import AutoModelForSequenceClassification,…
0
votes
2 answers

how to use deberta model from hugging face and use .compile() and . summary() with it

I used this code to load weights from transformers import DebertaTokenizer, DebertaModel import torch tokenizer = DebertaTokenizer.from_pretrained('microsoft/deberta-base') model = DebertaModel.from_pretrained('microsoft/deberta-base') after that…
0
votes
2 answers

Problem with batch_encode_plus method of tokenizer

I am encountering a strange issue in the batch_encode_plus method of the tokenizers. I have recently switched from transformer version 3.3.0 to 4.5.1. (I am creating my databunch for NER). I have 2 sentences whom I need to encode, and I have a case…
0
votes
0 answers

ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers

from os import listdir from os.path import isfile, join from datasets import load_dataset from transformers import BertTokenizer test_files = [join('./test/', f) for f in listdir('./test') if isfile(join('./test', f))] dataset =…
0
votes
1 answer

KeyError: "None of ['index'] are in the columns"

Here is a json file : { "id": "68af48116a252820a1e103727003d1087cb21a32", "article": [ "by mark duell .", "published : .", "05:58 est , 10 september 2012 .", "| .", "updated : .", "07:38 est ,…
Michael
  • 19
  • 6
0
votes
2 answers

File name too long

In a local repository, I have several json files. When I run the command from datasets import load_dataset dataset = load_dataset('json', data_files=['./100009.json']) I got the following error: OSError: [Errno 36] File name too long:…
Michael
  • 19
  • 6
-1
votes
1 answer

Create DataFrame from Object HuggingFace

I recently download a dataset from HuggingFace HuggingFace. I've used datasets.Dataset.load_dataset() and it gives me a Dataset backed by an Apache Arrow table. So I have problems to export the data into a DataFrame to work with pandas. The…
-1
votes
1 answer

Creating a function on Digital Ocean for hugging face

Hugging face provides transforms and models that allows AL/ML processing offline - https://huggingface.co/ We currently use Digital Ocean and I would like to unload our ML onto DO functions. I know AWS does this already with a few AWS…
-1
votes
3 answers

Hugging Face: NameError: name 'sentences' is not defined

I am following this tutorial here: https://huggingface.co/transformers/training.html - though, I am coming across an error, and I think the tutorial is missing an import, but i do not know which. These are my current imports: # Transformers…
1 2 3
14
15