I have a dataset with conversation data between a User and a Customer Service. I need to find the response time (time difference) between after a user sent a message and first time the Customer Service responded to the message
What I struggle with is a user can send multiple consecutive message before the Customer Service can reply. If a user sent multiple consecutive message, I need to take the time of the last row, but if the customer service sent multiple consecutive message, I need to take the time of the first row
My expected result is as below:
So, from the 10 message_id in the base table, there will be 3 rows:
- time difference between message_id 3 and 4
- time difference between message_id 7 and 8
- time difference between message_id 9 and 10
I've tried using lead function as follows but the result is not what I expected
select sender,
start_message,
start_reply,
date_diff('second',start_message,start_reply)/60.0 as response_time
from (
select message_id,
created_at as start_message,
lead(created_at) over(partition by sender order by created_at) as start_reply,
sender_type
from base_table
) t
order by 2 asc
I'm using athena, any help is greatly appreciated, thank you