As the title suggests, I have a dataframe containing two columns (both columns names are 0, 1), for example this is the dataframe content
A 8
B 6
C 9
Now I have a dictionary that includes
aliases = {'A': 'P', 'B': 'E', 'C': 'Q'}
and I want to apply this dictionary on the first column so the expected output would be
P 8
E 6
Q 9
In pandas I used to do it with df = df.replace({0: aliases})
But it wont work with dask.
I also came across this SO question and tried to use mask
in the following manner
df = df.mask(df[0], aliases)
but I got a TypeError("bad operand type for unary ~: 'str'")
EDIT:
I have tried to implement it as suggested in the post which is linked and I ran into an error with the metadata.
The code right now is :
new_columns = ['identifier', 'position', 'a', 'b', 'c', 'd']
pileup_df = pileup_df.rename(columns=dict(zip(pileup_df.columns, new_columns)))
pileup_df['identifier'] = pileup_df['identifier'].map(lambda x: alias_dict[x], meta=('identifier', pd.Series))
pileup_df.compute()
and I get the following traceback:
File "filter_pileup_from_lists_with_coordinate_name_conversion.py", line 72, in apply_conversion
pileup_df['identifier'] = pileup_df['identifier'].map(lambda x: alias_dict[x], meta=('identifier', pd.Series))
File "/home/eliran/miniconda/envs/newenv/lib/python3.7/site-packages/dask/dataframe/core.py", line 3055, in map
meta = make_meta(meta, index=getattr(make_meta(self), "index", None))
File "/home/eliran/miniconda/envs/newenv/lib/python3.7/site-packages/dask/utils.py", line 505, in __call__
return meth(arg, *args, **kwargs)
File "/home/eliran/miniconda/envs/newenv/lib/python3.7/site-packages/dask/dataframe/utils.py", line 339, in make_meta_object
return _empty_series(x[0], x[1], index=index)
File "/home/eliran/miniconda/envs/newenv/lib/python3.7/site-packages/dask/dataframe/utils.py", line 283, in _empty_series
return pd.Series([], dtype=dtype, name=name, index=index)
File "/home/eliran/miniconda/envs/newenv/lib/python3.7/site-packages/pandas/core/series.py", line 249, in __init__
dtype = self._validate_dtype(dtype)
File "/home/eliran/miniconda/envs/newenv/lib/python3.7/site-packages/pandas/core/generic.py", line 253, in _validate_dtype
dtype = pandas_dtype(dtype)
File "/home/eliran/miniconda/envs/newenv/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 1778, in pandas_dtype
raise TypeError(f"dtype '{dtype}' not understood")
TypeError: dtype '<class 'pandas.core.series.Series'>' not understood
I have tried to change pd.Series
to 'pd.DataFrameand
dict` and both result in a similar traceback