1

If I have a dataframe:

myData = {'start': [1, 2, 3, 4, 5],
          'end': [2, 3, 5,7,6],
          'number': [1, 2, 7,9, 7]
          }
df = pd.DataFrame(myData, columns=['start', 'end', 'number'])
df

And I need to do something like:

result = {'start': [1,  4, 5],
          'end': [7,7,6],
          'number': [10,9, 7]
          }
df = pd.DataFrame(result, columns=['start', 'end', 'number'])
df

If number < 1, start = start(previous row), end = end(current row), then delete previous rows.

That is, to merge the rows, the difference between the end of the first and the beginning of the second is less than 1, rewrite the new beginning, merge the number and delete the first.

Can I do it without iteration?

enter image description here

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Nik
  • 11
  • 2
  • please double check the output; the difference between 4 and 6 is ≤ 2, so the last row should combine with the previous – mozway Nov 24 '22 at 13:36

1 Answers1

2

You can use:

# identify when end - previous_start > 2
# and create a new group
group = df['end'].sub(df['start'].shift()).gt(2).cumsum()

# aggregate
out = df.groupby(group).agg({'start': 'first', 'end': 'last', 'number': 'sum'})

Output:

   start  end  number
0      1    3       3
1      3    5       7
2      4    6      16
mozway
  • 194,879
  • 13
  • 39
  • 75