Use this tag for questions related to the datasets project from Hugging Face. [Project on GitHub][1] [1]: https://github.com/huggingface/datasets
Questions tagged [huggingface-datasets]
221 questions
0
votes
1 answer
ValueError: Please pass `features` or at least one example when writing data
I'm new to huggingface and am working on a movie generation script. So far my code looks like this
from transformers import GPT2Tokenizer, GPTNeoModel
from datasets import load_dataset
dataset =…

Ulto 4
- 368
- 4
- 16
0
votes
1 answer
Key error when feeding the training corpus to the train_new_from_iterator method
I am following this tutorial here: https://github.com/huggingface/notebooks/blob/master/examples/tokenizer_training.ipynb
So, using this code, I add my custom dataset:
from datasets import load_dataset
dataset = load_dataset('csv',…
user16098918
0
votes
1 answer
Setting `remove_unused_columns=False` causes error in HuggingFace Trainer class
I am training a model using HuggingFace Trainer class. The following code does a decent job:
!pip install datasets
!pip install transformers
from datasets import load_dataset
from transformers import AutoModelForSequenceClassification,…

Hossein
- 2,041
- 1
- 16
- 29
0
votes
2 answers
how to use deberta model from hugging face and use .compile() and . summary() with it
I used this code to load weights
from transformers import DebertaTokenizer, DebertaModel
import torch
tokenizer = DebertaTokenizer.from_pretrained('microsoft/deberta-base')
model = DebertaModel.from_pretrained('microsoft/deberta-base')
after that…

Shorouk Adel
- 127
- 3
- 20
0
votes
2 answers
Problem with batch_encode_plus method of tokenizer
I am encountering a strange issue in the batch_encode_plus method of the tokenizers. I have recently switched from transformer version 3.3.0 to 4.5.1. (I am creating my databunch for NER).
I have 2 sentences whom I need to encode, and I have a case…

Anurag Sharma
- 4,839
- 13
- 59
- 101
0
votes
0 answers
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers
from os import listdir
from os.path import isfile, join
from datasets import load_dataset
from transformers import BertTokenizer
test_files = [join('./test/', f) for f in listdir('./test') if isfile(join('./test', f))]
dataset =…

Michael
- 19
- 6
0
votes
1 answer
KeyError: "None of ['index'] are in the columns"
Here is a json file :
{
"id": "68af48116a252820a1e103727003d1087cb21a32",
"article": [
"by mark duell .",
"published : .",
"05:58 est , 10 september 2012 .",
"| .",
"updated : .",
"07:38 est ,…

Michael
- 19
- 6
0
votes
2 answers
File name too long
In a local repository, I have several json files. When I run the command
from datasets import load_dataset
dataset = load_dataset('json', data_files=['./100009.json'])
I got the following error:
OSError: [Errno 36] File name too long:…

Michael
- 19
- 6
-1
votes
1 answer
Create DataFrame from Object HuggingFace
I recently download a dataset from HuggingFace HuggingFace.
I've used datasets.Dataset.load_dataset() and it gives me a Dataset backed by an Apache Arrow table.
So I have problems to export the data into a DataFrame to work with pandas.
The…

M.og.op.gpt
- 1
- 1
-1
votes
1 answer
Creating a function on Digital Ocean for hugging face
Hugging face provides transforms and models that allows AL/ML processing offline - https://huggingface.co/
We currently use Digital Ocean and I would like to unload our ML onto DO functions. I know AWS does this already with a few AWS…

RodgerThat
- 19
- 1
- 4
-1
votes
3 answers
Hugging Face: NameError: name 'sentences' is not defined
I am following this tutorial here: https://huggingface.co/transformers/training.html - though, I am coming across an error, and I think the tutorial is missing an import, but i do not know which.
These are my current imports:
# Transformers…
user16098918