I have a table that looks like so (1 example - total of 2 million rows):
tweet_id | id | group | created_at | tweet | response_tweet_id
1 sprintcare Support Tue Oct 31 22:10:47 +0000 2017 @115712 I understand. I would like to assist you. 2
2 115712 Customer Tue Oct 31 22:11:45 +0000 2017 @sprintcare and how do you propose we do that? NA
3 115712 Customer Tue Oct 31 22:08:27 +0000 2017 @sprintcare I have sent several private messages. 1
4 sprintcare Support Tue Oct 31 21:54:49 +0000 2017 @115712 Please send us a Private Message. 3
5 115712 Customer Tue Oct 31 21:49:35 +0000 2017 @sprintcare I did. 4
6 sprintcare Support Tue Oct 31 21:46:24 +0000 2017 @115712 Can you please send us a private message. 5,7
8 115712 Customer Tue Oct 31 21:45:10 +0000 2017 @sprintcare is the worst customer service 9,6,10
11 apple Support Tue Oct 31 22:10:35 +0000 2017 @115713 This is saddening to hear. NA
12 115713 Customer Tue Oct 31 22:04:47 +0000 2017 @apple My phone is not working. Help! 11
When referencing the first row, we can see the time the first support tweet response came in when responding to tweet_id 1 (this is labeled as 2).
Ideally - I would like to calculate how long it takes for each response tweet to come in - ordering from support to customer.
I would like to calculate two values:
the response time between the first tweet made to Support (tweet_id, response_tweet_id, and created_at). For the first row - the time difference would be: Tue Oct 31 22:10:47 +0000 2017 - Tue Oct 31 22:08:27 +0000 2017 = 00:02:20.
Total response time between first tweet from support to last tweet given each customer. In the example below - it would essentially be the first tweet made relative to time to support and the last tweet made before next unique id comes into play.
Desired output would look like so based on each group (100+ unique companies):
id | group | Avg_response_time_per_tweet (in minutes) | Total_avg_response_time (in minutes)
sprintcare Support #### ####
apple Support #### ####