This mapping works when calling head
on the first 100 rows:
ddf['val'] = ddf['myid'].map( val['val'] , meta=pd.Series(float) )
But when I try to save to parquet:
ddf.to_parquet('myfile.parquet',
compression='snappy',
write_index=False,
compute_kwargs={'scheduler':'threads'}
)
I am getting an error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects
.
But checking my index (after converting to pandas series), it is unique: val.index.duplicated().any()
is False
. Also, the index is the same set as the dataframe column it is being mapped to: myid
. There are no nulls, nans, or Nones in the index. The index is int64.
Update: curiously, if I load each parquet file for the original ddf one at a time, this does not error. If I load more than one at a time, it errors out.