I'm trying to read a bunch of JSON files stored on S3, but is raising a list index out of range
when I compute the DataFrame
My call to open the JSON files is like this:
pets_data = dd.read_json("s3://my-bucket/pets/*.json", meta=meta, blocksize=None, orient="records", lines=False)
and Is failing when I call to_csv
(to S3 or to local, both fails)
# save on local fails
pets_data.to_csv(
"pets-full-data.csv",
single_file=True,
index=False
)
# save on S3 fails as well
pets_data.to_csv(
"s3://my-bucket/pets-full-data.csv",
single_file=True,
index=False
)
StackTrace:
File "main.py", line 89, in <module>
pets_data.to_csv(
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1423, in to_csv
return to_csv(self, filename, **kwargs)
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/io/csv.py", line 808, in to_csv
value = to_csv_chunk(dfs[0], first_file, **kwargs)
IndexError: list index out of range
NOTE: This only occurs when I attempt to open the files from S3, when I open the files from local storage everything goes well