6

How would I go about renaming the index on a dask dataframe? I tried it like so

df.index.name = 'foo'

but rechecking df.index.name shows it still being whatever it was previously.

Samantha Hughes
  • 593
  • 1
  • 6
  • 13

2 Answers2

7

This does not seem like an efficient way to do it, so I wouldn't be surprised if there is something more direct.

d.index.name starts off as 'foo';

def f(df, name):
    df.index.name = name
    return df

d.map_partitions(f, 'pow')

The output now has index name of 'pow'. If this is done with the threaded scheduler, I think you also change the index name of d in-place (in which case you don't really need the output of map_partitions).

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • Adding: this strategy can also be applied to rename a Dask Series, just by removing the `.index` from `f` function. – paulochf Nov 24 '17 at 17:53
  • This seems off to me. This generates dask delayed tasks for something that should obviously be immediate. https://github.com/dask/dask/issues/4950 – stav Jun 17 '19 at 12:59
  • In dask-world, when to use `compute()` is up to the user. It may be best to combine with other operations. – mdurant Jun 17 '19 at 13:03
3

A bit late, but the following functions:

    import dask.dataframe as dd
    import pandas as pd
    df = pd.DataFrame().assign(s=[1, 2], o=[3, 4], p=[5, 6]).set_index("si")
    ddf = dd.from_pandas(df, npartitions=2)
    ddf.index = ddf.index.rename("si2")

I hope this can help someone else out!