0

I'm new to huggingface and am working on a movie generation script. So far my code looks like this

from transformers import GPT2Tokenizer, GPTNeoModel
from datasets import load_dataset
dataset = load_dataset('text',data_files={'train':['youtube_3/script.txt']})
tokenizer = GPT2Tokenizer.from_pretrained('EleutherAI/gpt-neo-1.3B')
model = GPTNeoModel.from_pretrained('EleutherAI/gpt-neo-1.3B')

However I keep getting this error

ValueError: Please pass `features` or at least one example when writing data Does this have anything to do with the way I define my tokenizers and such. How would I fix this? Any help would be appreciated.

Ulto 4
  • 368
  • 4
  • 16
  • I'm having the same issue; did you get around it? – jbm Oct 28 '21 at 23:40
  • Okay, it turns out that a problem in a vscode ssh session had left one of my data files empty. That's what was triggering the error. – jbm Oct 29 '21 at 01:19

1 Answers1

3

The prompt is telling you that you need a 'features' para for the 'load_dataset' method

from datasets import load_dataset,Features,Value
context_feat = Features({'text': Value(dtype='string', id=None)})
dataset = load_dataset(
    path="text",
    data_dir=path.data_dir,
    data_files="input.fm.plus.fc.txt",
    split="train",
    features=context_feat
)
Herb
  • 91
  • 1
  • 5