1

I need to turn a datetime index into an int column but keep the same index with the same name, but when I do an operation with the index the index loses its name. This only happens with the distributed scheduler. It also doesn't need to be a datetime conversion as you can see with the example below.

To fix the index I need to do something like the suggestion in this StackOverflow answer or do the assign with map_partitions instead. Am I able to compute a new column from the index without losing its name, maybe adding a column but passing a meta somewhere? Is map_partitions the ideal (only) approach?

Reproducible Example

import pandas as pd
import numpy as np
import dask.dataframe as dd
from dask.distributed import Client
client = Client()

df = pd.DataFrame({'A': range(1, 1001), 'B': np.random.randn(1000)})
print(type(df.index.name) ,df.index.name)
df.index.name = 'named'
print(type(df.index.name) ,df.index.name)
df = dd.from_pandas(df, npartitions=8)
print(type(df.index.name) ,df.index.name)
df = df.assign(**{'C':df.index.astype('str')})
print(type(df.index.name) ,df.index.name)

Output

<class 'NoneType'> None
<class 'str'> named
<class 'str'> named
<class 'NoneType'> None

Versions

pandas==0.24.1
distributed==1.25.3
dask==1.1.1
numpy==1.15.4

0 Answers0