4

I'm using DataFrame.explode() to unnest a column of lists such that each element gets its own row. What I'm wondering is how to create a new 'index' column that will correspond to the index of the element in the original list. In the example I'm calling this column 'cycle'.

I've hopefully attached an image below of what I'm trying to achieve. If there is already a page on this please share. Thank you!

enter image description here

ZachGutz
  • 51
  • 4

2 Answers2

4

Use groupby cumcount after explode:

df = df.explode('value').reset_index(drop=True)
df['cycle'] = df.groupby('sample').cumcount()  # Enumerate Groups
df = df[['sample', 'cycle', 'value']]  # Reorder columns

or with insert:

df = df.explode('value').reset_index(drop=True)
df.insert(1, 'cycle', df.groupby('sample').cumcount())  # Insert New Column

or with assign:

df = (
    df.explode('value')
        .assign(cycle=lambda x: x.groupby(level=0).cumcount())
        .reset_index(drop=True)[['sample', 'cycle', 'value']]
)

df:

   sample  cycle value
0       A      0     2
1       A      1     3
2       A      2     5
3       A      3     2
4       A      4     3
5       A      5     6
6       A      6     8
7       B      0    97
8       B      1    83
9       B      2     5
10      C      0   2.5
11      C      1     6
12      C      2     7
13      C      3     9
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • 1
    Thank you so much! The first method worked for me. My data was a little more complicated than the example I provided but essentially all I had to do to make it work was add the other column: I.e. `df.groupby(['col1', 'col2']).cumcount()` – ZachGutz Jun 03 '21 at 22:22
0

First, create your cycle column:

import pandas as pd

df = pd.DataFrame([
                    ["A",[2,3,5,2,3,6,8]],
                    ["B",[97,83,5]],
                    ["C",[2.5,6,7,9]]
                  ],
                   columns=["sample", "value"])


df["cycle"]=df["value"].str.len().apply(lambda x: list(range(0,x)))

Then you can simply explode both of the columns simultaneously. This option is available since the latest pandas version. (check the documentation here)

df.explode(["value","cycle"])

    sample  value   cycle
0        A      2       0
0        A      3       1
0        A      5       2
0        A      2       3
0        A      3       4
0        A      6       5
0        A      8       6
1        B     97       0
1        B     83       1
1        B      5       2
2        C    2.5       0
2        C      6       1
2        C      7       2
2        C      9       3
rsrjohnson
  • 100
  • 1
  • 8