How to create a new 'index' column after using pandas DataFrame.explode()?

Question

I'm using DataFrame.explode() to unnest a column of lists such that each element gets its own row. What I'm wondering is how to create a new 'index' column that will correspond to the index of the element in the original list. In the example I'm calling this column 'cycle'.

I've hopefully attached an image below of what I'm trying to achieve. If there is already a page on this please share. Thank you!

Henry Ecker · Answer 1 · 2021-06-03T19:45:05.193

Use groupby cumcount after explode:

df = df.explode('value').reset_index(drop=True)
df['cycle'] = df.groupby('sample').cumcount()  # Enumerate Groups
df = df[['sample', 'cycle', 'value']]  # Reorder columns

or with insert:

df = df.explode('value').reset_index(drop=True)
df.insert(1, 'cycle', df.groupby('sample').cumcount())  # Insert New Column

or with assign:

df = (
    df.explode('value')
        .assign(cycle=lambda x: x.groupby(level=0).cumcount())
        .reset_index(drop=True)[['sample', 'cycle', 'value']]
)

df:

   sample  cycle value
0       A      0     2
1       A      1     3
2       A      2     5
3       A      3     2
4       A      4     3
5       A      5     6
6       A      6     8
7       B      0    97
8       B      1    83
9       B      2     5
10      C      0   2.5
11      C      1     6
12      C      2     7
13      C      3     9

Thank you so much! The first method worked for me. My data was a little more complicated than the example I provided but essentially all I had to do to make it work was add the other column: I.e. `df.groupby(['col1', 'col2']).cumcount()` — ZachGutz, Jun 03 '21 at 22:22

score 0 · Answer 2 · answered Jul 21 '22 at 13:34

First, create your cycle column:

import pandas as pd

df = pd.DataFrame([
                    ["A",[2,3,5,2,3,6,8]],
                    ["B",[97,83,5]],
                    ["C",[2.5,6,7,9]]
                  ],
                   columns=["sample", "value"])


df["cycle"]=df["value"].str.len().apply(lambda x: list(range(0,x)))

Then you can simply explode both of the columns simultaneously. This option is available since the latest pandas version. (check the documentation here)

df.explode(["value","cycle"])

    sample  value   cycle
0        A      2       0
0        A      3       1
0        A      5       2
0        A      2       3
0        A      3       4
0        A      6       5
0        A      8       6
1        B     97       0
1        B     83       1
1        B      5       2
2        C    2.5       0
2        C      6       1
2        C      7       2
2        C      9       3

How to create a new 'index' column after using pandas DataFrame.explode()?

2 Answers2