4

I need help with a oracle query.

Here is my setup:

I have 2 tables called respectively "tasks" and "timesheets". The "tasks" table is a recursive one, that way each task can have multiple subtasks. Each timesheet is associated with a task (not necessarily the "root" task) and contains the number of hours worked on it.

Example:

Tasks

id:1 | name: Task A | parent_id: NULL

id:2 | name: Task A1 | parent_id: 1

id:3 | name: Task A1.1 | parent_id: 2

id:4 | name: Task B | parent_id: NULL

id:5 | name: Task B1 | parent_id: 4

Timesheets

id:1 | task_id: 1 | hours: 1

id:2 | task_id: 2 | hours: 3

id:3 | task_id:3 | hours: 1

id:5 | task_id:5 | hours:1 ...

What I want to do:

I want a query that will return the sum of all the hours worked on a "task hierarchy". If we take a look at the previous example, It means I would like to have the following results:

task A - 5 hour(s) | task B - 1 hour(s)

At first I tried this

SELECT TaskName, Sum(Hours) "TotalHours" 
FROM (
    SELECT replace(sys_connect_by_path(decode(level, 1, t.name), '~'), '~') As TaskName, 
    ts.hours as hours
    FROM tasks t INNER JOIN timesheets ts ON t.id=ts.task_id
    START WITH PARENTOID=-1
    CONNECT BY PRIOR t.id = t.parent_id
    )
GROUP BY TaskName Having Sum(Hours) > 0 ORDER BY TaskName

And it almost work. THe only problem is that if there are no timesheet for a root task, it will skip the whole hieararchy... but there might be timesheets for the child rows and it is exactly what happens with Task B1. I know it is the "inner join" part that is causing my problem but I'm not sure how can I get rid of it.

Any idea how to solve this problem?

Thank you

François
  • 987
  • 1
  • 8
  • 13
  • I think if you reverse the order of the tables and use a left outer join, you'll get tasks even when there's no time sheet so that *might* work... maybe? – FrustratedWithFormsDesigner May 27 '10 at 15:04
  • Thanks for your help. Doing a left join in my case is causing some major performance issues... even though the tables are well indexed and all. – François May 27 '10 at 15:30

3 Answers3

4

Would something like this work? I've had cases similar to yours, and I simply removed the join from the hierarchical query and applied it only afterward to avoid losing rows.

SELECT TaskName, Sum(ts.hours) "TotalHours" 
FROM (
    SELECT replace(sys_connect_by_path(decode(level, 1, t.name), '~'), '~') As TaskName, t.id
    FROM tasks t
    START WITH PARENTOID=-1
    CONNECT BY PRIOR t.id = t.parent_id
    ) tasks
INNER JOIN timesheets ts ON tasks.id=ts.task_id
GROUP BY TaskName Having Sum(ts.hours) > 0 ORDER BY TaskName
aw crud
  • 8,791
  • 19
  • 71
  • 115
  • Thanks a million! I think this is doing the trick. Some more testing to do and I'll flag this as the official answer. – François May 27 '10 at 15:29
1

Have you tried this?

SELECT TaskName, Sum(Hours) "TotalHours" 
FROM (
    SELECT replace(sys_connect_by_path(decode(level, 1, t.name), '~'), '~') As TaskName, 
    ts.hours as hours
    FROM timesheets ts  LEFT OUTER JOIN tasks t  ON t.id=ts.task_id
    START WITH PARENTOID=-1
    CONNECT BY PRIOR t.id = t.parent_id
    )
GROUP BY TaskName Having Sum(Hours) > 0 ORDER BY TaskName
FrustratedWithFormsDesigner
  • 26,726
  • 31
  • 139
  • 202
  • Thanks for your help! I already tried the left outer join option but it slows down the query tremendously. Perhaps I should take another look at the indexes in those tables... maybe there is one missing somewhere. But for the moment the solution proposed by RenderLn is what I am looking for. Again, thanks for your help. – François May 27 '10 at 15:33
  • The index that I would think of (just by looking at this query) is on `timesheets`, index of `task_id` (Assuming that tasks.id is already an index). But yeah, in general, outer joins are slower than inner. – FrustratedWithFormsDesigner May 27 '10 at 15:48
0

If you use left outer join instead of normal join, you may get the output.

SELECT TaskName, Sum(Hours) "TotalHours"  
FROM ( 
    SELECT replace(sys_connect_by_path(decode(level, 1, t.name), '~'), '~') As TaskName,  
    ts.hours as hours 
    FROM tasks t,timesheets ts where t.id=ts.task_id(+) 
    START WITH PARENTOID=-1 
    CONNECT BY PRIOR t.id = t.parent_id 
    ) 
GROUP BY TaskName Having Sum(Hours) > 0 ORDER BY TaskName 
Bharat
  • 6,828
  • 5
  • 35
  • 56
  • I think he would have to reverse the orders of the tables too, because he wants tasks which don't have corresponding timesheets. Currently, if he changes to left outer, he'd get timesheets which have no corresponding tasks. – FrustratedWithFormsDesigner May 27 '10 at 15:08
  • FrustratedWithFormsDesigner: You are correct or he can also use right outer join – Bharat May 27 '10 at 15:13