Сoncatenate rows in pandas with conditions and calculations

Question

If I have a dataframe:

myData = {'start': [1, 2, 3, 4, 5],
          'end': [2, 3, 5,7,6],
          'number': [1, 2, 7,9, 7]
          }
df = pd.DataFrame(myData, columns=['start', 'end', 'number'])
df

And I need to do something like:

result = {'start': [1,  4, 5],
          'end': [7,7,6],
          'number': [10,9, 7]
          }
df = pd.DataFrame(result, columns=['start', 'end', 'number'])
df

If number < 1, start = start(previous row), end = end(current row), then delete previous rows.

That is, to merge the rows, the difference between the end of the first and the beginning of the second is less than 1, rewrite the new beginning, merge the number and delete the first.

Can I do it without iteration?

enter image description here

please double check the output; the difference between 4 and 6 is ≤ 2, so the last row should combine with the previous — mozway, Nov 24 '22 at 13:36

score 2 · Answer 1 · answered Nov 24 '22 at 13:35

You can use:

# identify when end - previous_start > 2
# and create a new group
group = df['end'].sub(df['start'].shift()).gt(2).cumsum()

# aggregate
out = df.groupby(group).agg({'start': 'first', 'end': 'last', 'number': 'sum'})

Output:

   start  end  number
0      1    3       3
1      3    5       7
2      4    6      16

Сoncatenate rows in pandas with conditions and calculations

1 Answers1