0

I have a query that currently works but based is inefficient. I'm basically trying to group both labor and sales data by the hour.

And I want to be able to do it by giving the query a day.

I'm using PostgreSQL.

I have a bunch of time punch records that have the employee_id, job_id and location_id but if the employee has clocked in and hasn't clocked out I have to check the clock_out_time field and set it to now() to properly do the hourly calculation.

Planning time: 0.509 ms

Execution time: 0.498 ms

I'm doing this with 30-50 records so this will not scale.

What can I do to improve this?

SELECT
  date_trunc('hour', tp.clock_in_time) AS hour,
  SUM(
    (
      EXTRACT (DAY FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))*24*60*60+
      EXTRACT (HOUR FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))*60*60+
      EXTRACT (MINUTE FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))*60+
      EXTRACT (SECOND FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))
    ) / 60 / 60.00 * (job.rate / 100.00)
  ) AS labor_costs,
  (
  SELECT 
    SUM(total) / 100.00
    FROM 
        ticket
    WHERE 
        open=false 
    AND 
        DATE_TRUNC('day', opened_at) = date_trunc('day', '2018-12-22T11:15:05-05:00'::date) 
    AND
      DATE_TRUNC('day', closed_at) = date_trunc('day', '2018-12-22T11:15:05-05:00'::date) 
    GROUP BY date_trunc('hour', opened_at) 
    ORDER BY date_trunc('hour', opened_at)
    ) AS hourly_sales
FROM 
  employee_time_punch as tp
INNER JOIN
  employee
ON 
  employee.id = tp.employee_id
INNER JOIN
  employee_job as job
ON
  job.id = tp.job_id
WHERE
  DATE_TRUNC('day', tp.clock_in_time) = DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date)
AND
    DATE_TRUNC('day', CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END) = DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date)
GROUP BY 1
ORDER BY 1;
Community
  • 1
  • 1
Sean Parsons
  • 724
  • 4
  • 12

2 Answers2

1

WHERE DATE_TRUNC('day', tp.clock_in_time) = DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date)

This single filtering condition is hurting your query. It suffers from the "Left Side Expression in Equality" syndrome, that defeats the use of any index. At this point PostgreSQL is probably performing a Full Table Scan on the table.

You can make your query much faster if you rephrase the condition as in:

WHERE tp.clock_in_time BETWEEN ...begin_of_day... AND ...end_of_day...

You can precomute those values in a CTE is you wish.

And--of course--you'll need to have an index on the column, as in:

create index ix1 on employee_time_punch (clock_in_time);

With this change PostgreSQL will execute a Index Range Scan instead, something much faster.

The Impaler
  • 45,731
  • 9
  • 39
  • 76
0

As answered by @TheImpaler, the way you compare dates must be improved, and you can use a CTE to precompute the analyzis window.

Here are other simplifications of the query that should help make it faster and more readable :

  • express the WHERE clause as a CROSS JOIN on the analyzis window ; the COALESCE function can be used to default clock_in_time to NOW
  • to compute the hourly_sales, use a JOIN instead of a subquery
  • use a single EXTRACT(EPOCH FROM...) to compute the duration of the employee shift instead of repeated EXTRACT(HOUR/MINUTE/SECOND...)
  • move fixed arithmetic operations of the labor_costs outside of the SUM function

Query :

WITH dates AS ( 
    SELECT 
        DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date) AS wstart, 
        DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date) + interval '1' day AS wend
)
SELECT
  date_trunc('hour', tp.clock_in_time) AS hour,
  SUM(
      EXTRACT(EPOCH FROM COALESCE(tp.clock_out_time, NOW()) - tp.clock_in_time) 
      * job.rate
   ) / 60 / 60 / 100.00 AS labor_costs,
  SUM(ticket.total)/100.00 AS hourly_sales
FROM 
    dates
    INNER JOIN employee_time_punch AS tp
        ON  tp.clock_in_time BETWEEN dates.wstart AND dates.wend
        AND COALESCE(tp.clock_out_time, NOW()) BETWEEN dates.wstart AND dates.wend
    INNER JOIN employee
        ON  employee.id = tp.employee_id
    INNER JOIN employee_job AS job
        ON  job.id = tp.job_id
    INNER JOIN ticket
        ON  ticket.open = false 
        AND ticket.opened_at BETWEEN dates.wstart AND dates.wend
        AND ticket.closed_at BETWEEN dates.wstart AND dates.wend
GROUP BY 1;

For more optimization, you may create indexes on all date columns being involved (one composite index per table may perform well) :

  • in table employee_time_punch : clock_in_time and clock_out_time
  • in table ticket : opened_at and closed_at
GMB
  • 216,147
  • 25
  • 84
  • 135
  • I tried running this query and it didn't work. Was this written to work with Postgresql? – Sean Parsons Dec 23 '18 at 00:42
  • I'm still getting a issue on starting at the CROSS JOIN first ON: `ON tp.clock_in_time BETWEEN dates.wstart AND dates`. I'm trying to lookup CROSS JOIN documentation to figure it out myself but this is a feature I haven't used yet with SQL or Postgres so i'm a little out of my element on how to correct this SQL statement. – Sean Parsons Dec 23 '18 at 00:58
  • Looks like Cross Joins don't allow using ON: https://stackoverflow.com/questions/44437397/why-do-cross-join-conditions-not-work-in-the-on-clause-only-the-where-clause – Sean Parsons Dec 23 '18 at 01:00
  • That's anoter typo from my side ; should be `INNER JOIN` – GMB Dec 23 '18 at 01:02