1

The query below works, but I would like to know if there is a better way to do it.

There is one subquery that uses two subqueries. The two subqueries are identical but return two different fields. Is there a way to use only one subquery that returns two fields?

I checked similar questions (this, this and this), but I don't think they apply in this case.

Here is the query:

SELECT *,
       time(strftime('%s', EndTime) - strftime('%s', StartTime), 'unixepoch') AS Duration
  FROM (
           SELECT (
                      SELECT Time
                        FROM Log AS LogStart
                       WHERE LogStart.User = Log.User AND 
                             LogStart.Time <= Log.Time AND 
                             LogStart.[Action] != 'done'
                       ORDER BY LogStart.Time DESC
                       LIMIT 1
                  )
                  AS StartTime,
                  Time AS EndTime,
                  User,
                  (
                      SELECT [Action]
                        FROM Log AS LogStart
                       WHERE LogStart.User = Log.User AND 
                             LogStart.Time <= Log.Time AND 
                             LogStart.[Action] != 'done'
                       ORDER BY LogStart.Time DESC
                       LIMIT 1
                  )
                  AS [Action]
             FROM Log
            WHERE [Action] = 'done'
       )
 ORDER BY duration DESC;

Here is some test data:

CREATE TABLE Log (
    Time     DATETIME,
    User     CHAR,
    [Action] CHAR
);

insert into Log values('2017-01-01 10:00:00', 'Joe', 'Play');
insert into Log values('2017-01-01 10:01:00', 'Joe', 'done');
insert into Log values('2017-01-01 10:02:00', 'Joe', 'Sing');
insert into Log values('2017-01-01 10:03:00', 'Joe', 'done');
insert into Log values('2017-01-01 10:04:00', 'Ann', 'Play');
insert into Log values('2017-01-01 10:04:30', 'Bob', 'Action without corresponding "done" which must be ignored');
insert into Log values('2017-01-01 10:05:00', 'Joe', 'Play');
insert into Log values('2017-01-01 10:06:00', 'Ann', 'done');
insert into Log values('2017-01-01 10:07:00', 'Joe', 'done');
insert into Log values('2017-01-01 10:08:00', 'Ann', 'Play');
insert into Log values('2017-01-01 10:09:00', 'Ann', 'done');
Community
  • 1
  • 1
stenci
  • 8,290
  • 14
  • 64
  • 104
  • 2
    Do one JOIN instead. – jarlh Mar 13 '17 at 14:53
  • Why do you go from the end record to the start record, instead of the other direction? – CL. Mar 13 '17 at 14:55
  • @CL.Because some Actions start, but never end, and I want to find the duration of the Actions that end and ignore the ones without a matching "done". So I start from "done" and go back – stenci Mar 13 '17 at 14:57

2 Answers2

1

Use a self join... I don't have SQLLite, so the syntax may be off here, but you should get the idea...

Select e.*, time(strftime('%s', e.[Time]) - 
           strftime('%s', s.[Time]), 'unixepoch') AS Duration
From log e join log s  -- s is for the startevent; e for end event
    on s.[User] = e.[User]
       and s.[Action] != 'done'
       and e.[Action] = 'done'
       and s.[Time] = 
          (Select Max([time] from log
           where [User] = e.[User]
                   and [time] <= e.[Time]
                   and [Action] != 'done')
Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
  • Thanks, I had never used join with a subquery! Why did you use `left join` instead of `inner join`? – stenci Mar 13 '17 at 15:31
  • In case there was a end event without an start event, but reviewing the comments, I see you only want the ones with both a start and end, so you should make it an inner join. I edited my answer. – Charles Bretana Mar 13 '17 at 18:38
1

I made this: is what you were looking for? (MSSQL, but I think should work in SQLLite as there are not "non standard" SQL commands; I should write USER inside []).

SELECT STARTTIME, MIN(ENDTIME) AS ENDTIME, [USER], ACTION 
FROM (
SELECT B.TIME AS STARTTIME, A.TIME AS ENDTIME, A.[USER], B.ACTION
FROM LOG A
INNER JOIN (SELECT * FROM LOG) B ON A.[USER]= B.[USER] AND B.ACTION<>'done' AND B.TIME< A.TIME
WHERE A.Action='done'
) X 
GROUP BY X.STARTTIME, [USER], ACTION ;

Output:

STARTTIME               ENDTIME                 USER       ACTION
----------------------- ----------------------- ---------- ---------------------
2017-01-01 10:00:00.000 2017-01-01 10:01:00.000 Joe        Play
2017-01-01 10:02:00.000 2017-01-01 10:03:00.000 Joe        Sing
2017-01-01 10:04:00.000 2017-01-01 10:06:00.000 Ann        Play
2017-01-01 10:05:00.000 2017-01-01 10:07:00.000 Joe        Play
2017-01-01 10:08:00.000 2017-01-01 10:09:00.000 Ann        Play

With only your data, compared execution plan in MSSQL showed that your query "costs" 87% while this latter costs 13% (their sum is - of course - 100%)

etsa
  • 5,020
  • 1
  • 7
  • 18