0

I am trying to load a dataset from huggingface organization, but I am getting the following error:

ValueError: Couldn't cast string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 686
to
{'text': Value(dtype='string', id=None)}
because column names don't match

I used the following lines of code to load the dataset:

from datasets import load_dataset
dataset = load_dataset("datasetFile", use_auth_token=True)

Pleases note the dataset version = (2.0.0), I changed it to 1.18.2 but it did not work.

Is there any way to fix this error?

TMN
  • 63
  • 1
  • 2
  • 10
  • Please provide a [mcve], which includes all relevant code (plus samples) to reproduce this issue. In the current state, the question is not answerable. – dennlinger Mar 28 '22 at 12:38

2 Answers2

2

According to https://github.com/huggingface/datasets/issues/3700#issuecomment-1035400186, you actually want to use load_from_disk:

from datasets import load_from_disk
dataset = load_from_disk("datasetFile")
Barbara Gendron
  • 385
  • 1
  • 2
  • 16
Serge Chastel
  • 170
  • 1
  • 5
1

I solved this error by streaming the dataset.

from datasets import load_dataset
dataset = load_dataset("datasetFile", use_auth_token=True, streaming= True)

TMN
  • 63
  • 1
  • 2
  • 10