0

In Microsoft SQL server, how to count in-between rows based on the values from the same column.

Example

/-----------------------------------------\
| ID ---------- Event -------- UserID ----|
| 1 ----------- START -------- 000001 ----|
| 2 ----------- START -------- 000002 ----|
| 3 ----------- END   -------- 000001 ----|
| 4 ----------- PL    -------- 000002 ----|
| 5 ----------- END   -------- 000002 ----|
\-----------------------------------------/

Consider for UserID 000002, which has 3 rows present and the ID are 2, 4 & 5. Based on the query from this link I am able to get the START and END time, but how to get the count of rows between START and END for each USERID

Expected Result

/-------------------------------------------------------\
| UserID ------------------------- Row Count -----------|
| 000001 ------------------------- 2         -----------|
| 000002 ------------------------- 3         -----------|
\-------------------------------------------------------/
Community
  • 1
  • 1
aioracle
  • 371
  • 2
  • 7
  • 20
  • Not simple, please read – Hemal Feb 03 '17 at 07:46
  • @aioracle:- your question is not clear... because you told that "how to get the count of rows between START and END for each USERID" & also your Expected Result also differ..please clear what you want..? – Darshak Feb 03 '17 at 07:48
  • Between start and end means that there should be some way to order the rows by. What version of sql server are you working with? – Zohar Peled Feb 03 '17 at 08:11
  • Can there be more than one start for a user? More than one end? Records before start? Records after end? If so, what to do in these cases? – Thorsten Kettner Feb 03 '17 at 08:39
  • As you are talking of rows *between* others, I suppose that ID is not just some technical ID here to identify records, but you also consider a record with a small ID *previous* to a record with a higher ID. Is this correct? – Thorsten Kettner Feb 03 '17 at 08:44
  • The scenario is not clear. **(1)** Is there only a single START-END per UserId? **(2)** Does a START always have an END? **(3)** Is START always the first EVENT and END is the last event per UserID? – David דודו Markovitz Feb 03 '17 at 12:14
  • Also - what is your SQL Server version? – David דודו Markovitz Feb 03 '17 at 12:15

2 Answers2

4

SELECT USERID , sum(CASE WHEN MainT.ID BETWEEN StartT.ID AND EndT.ID THEN 1 ELSE 0 END) AS RowCount FROM Table1 AS MainT INNER JOIN Table1 AS StartT ON MainT.USERID = StartT.USERID AND StartT.event = 'START' INNER JOIN Table1 AS EndT ON MainT.USERID = EndT.USERID AND EndT.event = 'END' GROUP BY USERID

This should produce the desired output, with the following assumptions:

  • ID is (auto) incrementing, so a user's START event has the lowest ID. Hopefully a timestamp is available in the real data.
  • Each relevant user has a one START and zero or one END events, but users that start or end outside the dataset are not included.

In response to the comment:

Normally, you cannot show "the" value for a field that's not being grouped by, as there might be multiple values in play. That means you need to tell SQL server how to handle the values with some kind of aggregate function.

If you are certain there's only one possible value per group, you can cheat a bit and take min(event) or max(event). That gives you the first or last value alphabetically that appears in the group. This is risky though and likely to cause issues if you later include more events in that table.

You can split the count by event type if there are only a few:

sum(CASE WHEN **MainT.event ='EventA' AND** MainT.ID BETWEEN StartT.ID AND EndT.ID THEN 1 ELSE 0 END) AS EventACount, sum(CASE WHEN **MainT.event ='EventB' AND** MainT.ID BETWEEN StartT.ID AND EndT.ID THEN 1 ELSE 0 END) AS EventBCount,

Finally, you could use some kind of list aggregation, concatenating all event types appearing in the group. This question goes into more detail on that.

Community
  • 1
  • 1
Cyrus
  • 2,135
  • 2
  • 11
  • 14
  • Second join should be a left join under these assumptions. – ATC Feb 03 '17 at 09:07
  • I'm excluding the users that don't have an END record yet, hence the inner join. If a "count up to now" is desired, it should be a left join and the case should read "CASE WHEN MainT.ID BETWEEN StartT.ID AND ISNULL(EndT.ID,MaintT.ID) THEN 1 ELSE 0 END – Cyrus Feb 03 '17 at 09:21
  • I see. I think this question needs some more explanations. Your answer seems good to me. +1 – ATC Feb 03 '17 at 09:25
  • @Cyrus the solution works as expected. Is there any possibility to display the event which is between START and END in another column? – aioracle Feb 03 '17 at 12:44
  • @Cyrus, I cannot mark 2 answers. Your solution also works for me. – aioracle Feb 03 '17 at 13:14
  • @aioracle No problem, but thanks for the comment, that way other readers know this works too. – Cyrus Feb 03 '17 at 13:17
1

The scenario is not well defined and you can see it in the varying complexity of the solutions.

This will handle the simple use-case.

select      UserID
           ,count(*)    as cnt
from        mytable

This will handle the complex use-case.

select      UserID
           ,min(ID)     as from_ID
           ,max(ID)     as to_ID
           ,count(*)    as events

from       (select      UserID,ID,Event
                       ,    count(case when Event in ('START','END') then 1 end) over 
                            (
                                partition by    UserID 
                                order by        Id 
                                rows            unbounded preceding
                            )   
                        -   case when Event = 'END' then 1 else 0 end   as group_seq

            from        mytable
            ) t

group by    UserID
           ,group_seq

having      min(case when Event = 'START' then 1 end) = 1

order by    UserID
           ,from_id
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
  • the solution for complex use-case has provided me with solution, however, in case I want to know what is the event values between START and END, how can I do it? – aioracle Feb 03 '17 at 12:55
  • How does the result you are expecting to see looks like? – David דודו Markovitz Feb 03 '17 at 12:57
  • P.s. If you'll run only the inner query you'll see that each group of events (START to END) has its own `group_seq` – David דודו Markovitz Feb 03 '17 at 13:05
  • In case I want to display like /--------------------------------------------------------------------------\ | UserID -------- Row Count ------- valuebetweenS_E ----- | | 000001 -------- 2 ----------- NULL --------------------- | | 000002 -------- 3 ----------- PL --------------------------| \--------------------------------------------------------------------------/ – aioracle Feb 03 '17 at 13:07
  • It depends on how my customer wants to view the data. I am just preparing to "In case" if they want. – aioracle Feb 03 '17 at 13:13