How do I average the difference between specific values in TSQL?

Question

Hey folks, sorry this is a bit of a longer question...

I have a table with the following columns:

[ChatID] [User] [LogID] [CreatedOn] [Text]

What I need to find is the average response time for a given user id, to another specific user id. So, if my data looks like:

[1] [john] [20] [1/1/11 3:00:00] [Hello]
[1] [john] [21] [1/1/11 3:00:23] [Anyone there?]
[1] [susan] [22] [1/1/11 3:00:43] [Hello!]
[1] [susan] [23] [1/1/11 3:00:53] [What's up?]
[1] [john] [24] [1/1/11 3:01:02] [Not much]
[1] [susan] [25] [1/1/11 3:01:08] [Cool]

...then I need to see that Susan has an average response time of (20 + 6) / 2 => 13 seconds to John, and John has an average of (9 / 1) => 9 seconds to Susan.

I'm not even sure this can be done in set-based logic, but if anyone has any ideas, they'd be much appreciated!

Much easier to do set based with `lag` and `lead` I think. [Vote for that here](https://connect.microsoft.com/SQLServer/feedback/details/254388/over-clause-enhancement-request-lag-and-lead-functions) — Martin Smith, Jan 04 '11 at 22:00
Is there a limit to the number of users within a ChatID context? i.e. For a single ChatID, will there only be two users (like your example) or can there be an unlimited number of users? — chezy525, Jan 04 '11 at 22:48
Can you supply a less simplified example then? If you have 3 messages in order from alice,bob and charles how do you know whether charles was responding to bob or alice? — Martin Smith, Jan 05 '11 at 00:21
You can assume that a message from one user is in response to the most recent message from another user, so the simplified use case holds. — Jerod Venema, Jan 05 '11 at 14:26

Chris Shaffer · Accepted Answer · 2011-01-04T21:38:53.493

8

I don't have a PC to verify syntax or anything, but I think this should give you a starting place:

WITH ChatWithRownum AS (
    SELECT ChatID, User, LogID, CreatedOn, ROW_NUMBER() OVER(ORDER BY ChatID, CreatedOn) AS rownum
    FROM ChatLog
)
SELECT First.ChatID, Second.User,
    AVG(DATEDIFF(seconds, First.CreatedOn, Second.CreatedOn)) AS AvgElapsedTime
FROM ChatWithRownum First
    JOIN ChatWithRownum Second ON First.ChatID = Second.ChatID
        AND First.rownum = Second.rownum - 1
WHERE First.User != Second.User
GROUP BY First.ChatID, Second.User

Essentially, the idea is to add row numbers to the data so you can join one row to the next row (so you have a statement followed by its immediate response). Once you have the rows joined, you can get the time elapsed between the two entries, and then group the data by the ChatID (I'm assuming that times between separate chats aren't relevant) and the two users. Like I said though, this is just a starting place as I'm sure there may be some additional criteria and/or bugs in my query :)

edited Jan 04 '11 at 21:38

answered Jan 04 '11 at 21:26

Chris Shaffer

32,199
5
49
61

1

Oops, I just edited my SQL and the original may be better; I'll leave as is and you can view the edit history to see the other query. Current query will tell you that john averages xx seconds to respond to anyone, where the previous version would say john averages xx seconds to respond to susan and susan averages yy seconds to respond to john, etc (which sounds more like what you originally requested). – Chris Shaffer Jan 04 '11 at 21:42
Turns out my question was inaccurate (I meant to have the response time for John to be relative to the *first* message from Susan, so 43 / 2 seconds, but since the original answer here did actually answer the question (quite neatly I might add), you get the points :). – Jerod Venema Jan 05 '11 at 01:24
+1 \*slap-on-the-brow\* Chris Shaffer has the best answer, why i'm so hell-bent on leveraging the logic of my other answer http://stackoverflow.com/questions/1610599/how-can-i-do-a-contiguous-group-by-in-mysql/1611246#1611246 to this answer – Michael Buen Jan 05 '11 at 02:31
regarding susan's `43 / 2` (or is it `49 / 2`? 49 being 43 + 6), the only redeeming factor of my solution is that it could be easily fashioned for jvenema's real problem(response time relative to the **first** message(though was not stated in the original question)), just change the MAX to MIN. anyway, it's left as an exercise for the SOers :-) – Michael Buen Jan 05 '11 at 03:12
@Michael - it's 43 - the time we're trying to measure is how long it takes to reply, so the *first* response to the *first* message from a given user is actually what we're looking for (although in my original question, I asked for the *first* response to the *last* message). – Jerod Venema Jan 05 '11 at 14:28
3:00:43(response) - 3:00:00(first message) == 43. 3:01:08(response) - 3:01:02 == 6. That's where I got the 49. Anyway, the real problem is not well-defined, hence I come up with 49, that's just my way of deducing the real problem(average time (and based on first responses to first messages), and the problem is not about the *average* anymore if there's no other numbers are involved) – Michael Buen Jan 05 '11 at 15:12

Chris Gessler · Answer 2 · 2011-01-04T23:20:05.577

Try something simple like the following before moving into cursors.

select ChatId, User, datediff('second', min(CreatedOn, max(CreatedOn))/count(*)
from ChatLog
group by ChatId, User

This one works and doesn't involve using cursors. If I had more time, I could probably even eliminate the temp table, but hey... it works.

declare @operator varchar(50)
set @operator = 'john'
declare @customer varchar(50)
set @customer = 'susan'
declare @chatid int
set @chatid = 1

declare @t table (chatid int, username varchar(50), responsetime int)

insert @t (chatid, username, responsetime)
select ChatId, 
    Username,
    datediff(second, 
    CreatedOn,
    (
        select min(createdon)
        from chatlog
        where createdon > cl.createdon
        and username = @customer
              and chatid = @chatid
    ))
from ChatLog cl
where chatid = @chatid and username = @operator

insert @t (chatid, username, responsetime)
select ChatId, 
    Username, 
    datediff(second, 
    CreatedOn,
    (
        select min(createdon)
        from chatlog
        where createdon > cl.createdon
        and username = @operator
              and chatid = @chatid
    ))
from ChatLog cl
where chatid = @chatid and username = @customer

select chatid, username, avg(responsetime) as avgresponsetime 
from @t
group by chatid, username
order by username

Don't think this will yield the data set the OP is looking for. — suhprano, Jan 04 '11 at 22:50
I sort of knew the first one wouldn't work, I merely suggesting that simple select statements should be considered first before using cursors. Probably should have added that as a comment instead of a solution, so I went ahead and implemented a rough solution. — Chris Gessler, Jan 04 '11 at 23:25

suhprano · Answer 3 · 2011-01-04T22:11:52.207

2

Seems like you need a cursor to step through each line and check for the change of user in the record, and get the difference of that time, and store it somewhere(temp table maybe), and aggregate it later.

I believe it can be done in TSQL, logic would be something like:


DECLARE delta CURSOR FOR
SELECT user, createdon from table
order by createdon --or logid

OPEN delta
fetch next from delta into @username, @time
while @@fetch_status = 0

begin

FETCH PRIOR FROM delta into @username_prior, @time_prior
IF @username_prior  @username
BEGIN
  @timedelta = @time - @time_prior
  @total = @total + @timedelta
  insert into #average (@username, @total)
END 

fetch next from delta into @username, @time
END

CLOSE delta
DEALLOCATE delta

SELECT user, AVG(time) from #average
group by user

I'm sure you can figure out how to declare all the parameters.

edited Jan 04 '11 at 22:11

answered Jan 04 '11 at 21:51

suhprano

1,723
1
16
22

+1 This does have the advantage of one scan through the data. I think the syntax is a bit off though. Definitely needs an `Order By` on the select. – Martin Smith Jan 04 '11 at 22:04
Cursors in SQL are not recommended. – Chris Gessler Jan 04 '11 at 22:36
@Chris - Generally not but sometimes they are the best solution unless or until Microsoft fully implement the `OVER` clause. Running totals being a classic example. [Some performance comparisons here](http://tsql.solidq.com/OVER_Clause_and_Ordered_Calculations.doc) No idea which would be the most efficient in this particular case (cursors vs joining on rownumber = rownumber+1) – Martin Smith Jan 04 '11 at 22:47
Agreed. In very rare cases, a cursor is warranted. – Chris Gessler Jan 04 '11 at 23:29

score 1 · Answer 4 · answered Jan 04 '11 at 21:30

This can be done with RowNumber() and DateDiff()

WITH TableWithOrderings AS (
    SELECT DateTime, ROW_NUMBER() OVER (ORDER BY DateTime) AS Ordering
    FROM myTable
)

WITH Intervals As (
    SELECT DATEDIFF(second, A.DateTime, B.DateTime) AS IntervalSeconds
    FROM TableWithOrderings A
        INNER JOIN TableWithOrderings B ON B.Ordering = A.Ordering + 1
)

SELECT AVG(IntervalSeconds) FROM Intervals

Michael Buen · Answer 5 · 2011-01-05T01:53:22.373

Try this:

create table chats
(
chat_id int not null,
user_name text not null,
log_id int not null primary key,
created_on timestamp not null,
message text not null
);


insert into chats(chat_id, user_name, log_id, created_on, message)
values(1, 'john', 20, '1/1/11 3:00:00', 'Hello'),
(1, 'john',21, '1/1/11 3:00:23', 'Anyone there?'),
(1, 'susan',22, '1/1/11 3:00:43', 'Hello!'),
(1, 'susan', 23, '1/1/11 3:00:53', 'What''s up?'),
(1, 'john', 24, '1/1/11 3:01:02', 'Not much'),
(1, 'susan', 25, '1/1/11 3:01:08', 'Cool')

Sample data:

select c.*, 'x', next.*
from chats c
left join chats next on next.log_id = c.log_id + 1 
order by c.log_id

Output:

 chat_id | user_name | log_id |     created_on      |    message    | ?column? | chat_id | user_name | log_id |     created_on      |    message    
---------+-----------+--------+---------------------+---------------+----------+---------+-----------+--------+---------------------+---------------
       1 | john      |     20 | 2011-01-01 03:00:00 | Hello         | x        |       1 | john      |     21 | 2011-01-01 03:00:23 | Anyone there?
       1 | john      |     21 | 2011-01-01 03:00:23 | Anyone there? | x        |       1 | susan     |     22 | 2011-01-01 03:00:43 | Hello!
       1 | susan     |     22 | 2011-01-01 03:00:43 | Hello!        | x        |       1 | susan     |     23 | 2011-01-01 03:00:53 | What's up?
       1 | susan     |     23 | 2011-01-01 03:00:53 | What's up?    | x        |       1 | john      |     24 | 2011-01-01 03:01:02 | Not much
       1 | john      |     24 | 2011-01-01 03:01:02 | Not much      | x        |       1 | susan     |     25 | 2011-01-01 03:01:08 | Cool
       1 | susan     |     25 | 2011-01-01 03:01:08 | Cool          | x        |         |           |        |                     |

The grouping:

select c.*, 'x', next.*, count(case when next.user_name is null or next.user_name <> c.user_name then 1 end) over(order by c.log_id)
from chats c
left join chats next on next.log_id + 1 = c.log_id 
order by c.log_id

Output:

 chat_id | user_name | log_id |     created_on      |    message    | ?column? | chat_id | user_name | log_id |     created_on      |    message    | count 
---------+-----------+--------+---------------------+---------------+----------+---------+-----------+--------+---------------------+---------------+-------
       1 | john      |     20 | 2011-01-01 03:00:00 | Hello         | x        |         |           |        |                     |               |     1
       1 | john      |     21 | 2011-01-01 03:00:23 | Anyone there? | x        |       1 | john      |     20 | 2011-01-01 03:00:00 | Hello         |     1
       1 | susan     |     22 | 2011-01-01 03:00:43 | Hello!        | x        |       1 | john      |     21 | 2011-01-01 03:00:23 | Anyone there? |     2
       1 | susan     |     23 | 2011-01-01 03:00:53 | What's up?    | x        |       1 | susan     |     22 | 2011-01-01 03:00:43 | Hello!        |     2
       1 | john      |     24 | 2011-01-01 03:01:02 | Not much      | x        |       1 | susan     |     23 | 2011-01-01 03:00:53 | What's up?    |     3
       1 | susan     |     25 | 2011-01-01 03:01:08 | Cool          | x        |       1 | john      |     24 | 2011-01-01 03:01:02 | Not much      |     4
(6 rows)

The grouped result:

with grouped_result as
(
select c.log_id, c.user_name, count(case when next.user_name is null or next.user_name <> c.user_name then 1 end) over(order by c.log_id) as the_grouping
from chats c
left join chats next on next.log_id + 1 = c.log_id 
order by c.log_id
)
select user_name, max(log_id) as last_chat_of_each_user
from grouped_result
group by the_grouping
    ,user_name
order by last_chat_of_each_user

Output:

 user_name | last_chat_of_each_user 
-----------+------------------------
 john      |                     21
 susan     |                     23
 john      |                     24
 susan     |                     25
(4 rows)

Chat and responses:

with grouped_result as
(
select c.log_id, c.user_name, count(case when next.user_name is null or next.user_name <> c.user_name then 1 end) over(order by c.log_id) as the_grouping
from chats c
left join chats next on next.log_id + 1 = c.log_id 
order by c.log_id
),
last_chats as
(
select user_name as responded_to, max(log_id) as last_chat_of_each_user
from grouped_result
group by the_grouping
    ,responded_to
)
select lc.responded_to, lc.last_chat_of_each_user as responded_to_log_id, lc_the_chat.created_on as responded_to_timestamp, 'x',  answered_by.user_name as responded_by, answered_by.created_on as response_created_on
from last_chats lc
join chats lc_the_chat on lc_the_chat.log_id = lc.last_chat_of_each_user
join chats answered_by on answered_by.log_id = lc.last_chat_of_each_user + 1
order by lc.last_chat_of_each_user

Output:

 responded_to | responded_to_log_id | responded_to_timestamp | ?column? | responded_by | response_created_on 
--------------+---------------------+------------------------+----------+--------------+---------------------
 john         |                  21 | 2011-01-01 03:00:23    | x        | susan        | 2011-01-01 03:00:43
 susan        |                  23 | 2011-01-01 03:00:53    | x        | john         | 2011-01-01 03:01:02
 john         |                  24 | 2011-01-01 03:01:02    | x        | susan        | 2011-01-01 03:01:08
(3 rows)

Chat's response average time:

with grouped_result as
(
select c.log_id, c.user_name, count(case when next.user_name is null or next.user_name <> c.user_name then 1 end) over(order by c.log_id) as the_grouping
from chats c
left join chats next on next.log_id + 1 = c.log_id 
order by c.log_id
),
last_chats as
(
select user_name as responded_to, max(log_id) as last_chat_of_each_user
from grouped_result
group by the_grouping
    ,responded_to
),
responses as
(
select lc.responded_to, lc.last_chat_of_each_user as responded_to_log_id, lc_the_chat.created_on as responded_to_timestamp,  answered_by.user_name as responded_by, answered_by.created_on as response_created_on
from last_chats lc
join chats lc_the_chat on lc_the_chat.log_id = lc.last_chat_of_each_user
join chats answered_by on answered_by.log_id = lc.last_chat_of_each_user + 1
order by lc.last_chat_of_each_user
)
select responded_by, responded_to, sum(response_created_on - responded_to_timestamp), count(*), avg(response_created_on - responded_to_timestamp) as average_response_to_person
from responses
group by responded_by, responded_to

Output:

 responded_by | responded_to |   sum    | count | average_response_to_person 
--------------+--------------+----------+-------+----------------------------
 susan        | john         | 00:00:26 |     2 | 00:00:13
 john         | susan        | 00:00:09 |     1 | 00:00:09
(2 rows)

Will work out-of-the box on Postgresql. To make it work on Sql Server, just change the response_created_on - responded_to_timestamp to corresponding Sql Server DATEDIFF construct (i cannot recall off the top my head what's the DATEDIFF for seconds)

score 1 · Answer 6 · answered Jan 06 '11 at 00:42

This will get the job done, but I'm not sure how it will scale:

select spoke, responded, count(*) responses, avg(time_diff) avg_seconds from (
select a.user_name spoke, b.user_name responded, a.created_on spoke_at, min(b.created_on) responded_at, datediff(ss, a.created_on, min(b.created_on)) time_diff
from chats a, chats b
where a.chat_id = b.chat_id
 and a.log_id < b.log_id
 and not exists (select 1 from chats c where c.chat_id = a.chat_id and c.log_id < b.log_id and c.log_id > a.log_id)
group by a.user_name, b.user_name, a.created_on
) users group by spoke, responded

 spoke     responded     responses     avg_seconds    
 --------  ------------  ------------  -------------- 
 john      john          1             23             
 susan     john          1             9              
 john      susan         2             13             
 susan     susan         1             10

4 record(s) selected [Fetch MetaData: 0ms] [Fetch Data: 0ms]

It should be okay with an index on (chat_id, log_id).

If you'd like to eliminate same responses, all you need is a != in the outer where clause:

select spoke, responded, count(*) responses, avg(time_diff) avg_seconds from (
select a.user_name spoke, b.user_name responded, a.created_on spoke_at, min(b.created_on) responded_at, datediff(ss, a.created_on, min(b.created_on)) time_diff
from chats a, chats b
where a.chat_id = b.chat_id
 and a.log_id < b.log_id
 and not exists (select 1 from chats c where c.chat_id = a.chat_id and c.log_id < b.log_id and c.log_id > a.log_id)
group by a.user_name, b.user_name, a.created_on
) users 
where spoke != responded
group by spoke, responded

How do I average the difference between specific values in TSQL?

6 Answers6