-1

I am trying to combine the string in column text_info in one row only if index is consecutive. The data I have looks very similar to the below table:

index text_info
0.0 word 1
NAN NAN
3.0 word2
0.0 word3
1.0 word4
2.0 word5
4.0 word6

I would like to combine the text in rows 0,1 and 2 in one row to look like this:

index text_info
0.0 word 1
NAN NAN
3.0 word2
0.0 word3, word4, word5
4.0 word6

The data contains similar consecutive indexes.

I tried multiple solutions including the answer in this question but it did not work. I also tried multi = df.groupby('index',dropna=False)["text_info"].sum() but it combined all the indexes in consecutive order.

Is there a way to do this?

J.Doe
  • 353
  • 1
  • 2
  • 12

1 Answers1

2

Try this. pd.to_numeric() is used in case your NAN values are strings and not np.NaN values.

(df.astype(str)
 .groupby(pd.to_numeric(df['index'],errors='coerce').diff().ne(1).cumsum(),as_index=False)
    .agg({'index':'first','text_info':', '.join}))

Output:

  index            text_info
0   0.0               word 1
1   nan                  nan
2   3.0                word2
3   0.0  word3, word4, word5
4   4.0                word6
rhug123
  • 7,893
  • 1
  • 9
  • 24