1

I have a tokenized dataset titled, tokenized_datasets as follows:

enter image description here

I want to add a column titled ['labels'] that is a copy of ['input_ids'] within the features. I'm aware of the following method from this post Add new column to a HuggingFace dataset:

new_dataset = dataset.add_column("labels", tokenized_datasets['input_ids'].copy())

But I first need to access the Dataset Dictionary. This is what I have so far but it doesn't seem to do the trick:

def new_column(example):
    example["labels"] = example["input_ids"].copy()
    return example

dataset_new = tokenized_datasets.map(new_column)

KeyError: 'input_ids'
ablam
  • 129
  • 8

1 Answers1

1

Try one of the two options below:

# first option
def new_column(example):
return {"labels" = example["input_ids"]}

# second option
def new_column(example):
    example["labels"] = example["input_ids"]
    return example

dataset_new = tokenized_datasets.map(new_column)
mp97
  • 48
  • 6
  • 1
    So apparently, the issue was referring to `dataset_new = dataset.map(new_column)` which is why `'input_ids'` was not recognized. Changed it to `tokenized_datasets.map(new_column)` in my post and code which worked but I did not get rid of `copy()` – ablam Jul 13 '22 at 17:06