0

I have the following simple code copied from Huggingface examples:

model_checkpoint = "distilgpt2"

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

def tokenize_function(examples):
    return tokenizer(examples["text"])

from datasets import load_dataset
datasets = load_dataset('wikitext', 'wikitext-2-raw-v1')
tokenized_datasets = datasets.map(tokenize_function, batched=False, num_proc=4, remove_columns=["text"])

When I set batched=False then the progress bar shows green color which indicates success, but if I set batched=True then the progress bar shows red color and does not reach 100%. Does that mean my map function failed or something else?

DiveIntoML
  • 2,347
  • 2
  • 20
  • 36

1 Answers1

1

It is likely a bug in the printing logic, not in processing itself. Some relevant discussion at discuss.huggingface.co is here and on GitHub it is here.

wschella
  • 141
  • 5