0

I have a query which LEFT JOINing two tables with timestamptz columns and grouping result by

(date_trunc(
    'DAY',
    "table_one"."ttz" AT TIME ZONE
    'America/Los_Angeles'
    )
    -
date_trunc(
    'DAY',
    "table_two"."ttz" AT TIME ZONE
    'America/Los_Angeles')) as period

With such groping query performance drops from 1s (when grouping by other column) to 40-60s. Is this a known issue and are there any workarounds? This behaviour doesn't depend on hardware configuration (tested on server machine with optimized Postgres configuration). I also use Citus extension, an table are partitioned by date range, but this is not related (tested).

Table DLL

CREATE TABLE table_one
(
    user_id VARCHAR,
    ttz     timestamptz
);

Query

SELECT date_trunc(
               'DAY',
               table_one."ttz" AT TIME ZONE
               'America/Los_Angeles'
           ) AT TIME ZONE 'America/Los_Angeles' table_one_day,
       (date_trunc(
                'DAY',
                "table_one"."ttz" AT TIME ZONE
                'America/Los_Angeles'
            )
           -
        date_trunc(
                'DAY',
                "table_two"."ttz" AT TIME ZONE
                'America/Los_Angeles'))         period,
       count(DISTINCT table_two.user_id)
FROM table_one
         LEFT JOIN table_two ON table_one.user_id = table_two.user_id
GROUP BY table_one_day, period;

Plan when grouping only by table_one_day

GroupAggregate  (cost=0.00..0.00 rows=0 width=0) (actual time=760.606..760.606 rows=1 loops=1)
  Output: remote_scan.first_ev_day_trunc, count(DISTINCT remote_scan.count)
  Group Key: remote_scan.first_ev_day_trunc
  ->  Sort  (cost=0.00..0.00 rows=0 width=0) (actual time=760.585..760.585 rows=6 loops=1)
        Output: remote_scan.first_ev_day_trunc, remote_scan.count
        Sort Key: remote_scan.first_ev_day_trunc
        Sort Method: quicksort  Memory: 25kB
        ->  Custom Scan (Citus Real-Time)  (cost=0.00..0.00 rows=0 width=0) (actual time=760.577..760.578 rows=6 loops=1)
              Output: remote_scan.first_ev_day_trunc, remote_scan.count
              Task Count: 32
              Tasks Shown: One of 32
              ->  Task
                    Node: host=94.130.157.249 port=5432 dbname=klonemobile
                    ->  Group  (cost=89.13..89.25 rows=8 width=40) (actual time=0.339..0.343 rows=1 loops=1)
                          Output: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), table_two.user_id
                          Group Key: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), table_two.user_id
                          Buffers: shared hit=9
                          ->  Sort  (cost=89.13..89.15 rows=8 width=40) (actual time=0.337..0.338 rows=24 loops=1)
                                Output: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), table_two.user_id
                                Sort Key: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), table_two.user_id
                                Sort Method: quicksort  Memory: 26kB
                                Buffers: shared hit=9
                                ->  Hash Left Join  (cost=44.44..89.01 rows=8 width=40) (actual time=0.281..0.307 rows=24 loops=1)
                                      Output: timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time"))), table_two.user_id
                                      Hash Cond: ((table_one.user_id)::text = (table_two.user_id)::text)
                                      Join Filter: ((table_one."time" < table_two."time") AND ((table_one."time" + '2 days'::interval day to second) >= table_two."time"))
                                      Rows Removed by Join Filter: 1
                                      Buffers: shared hit=3
                                      ->  Append  (cost=0.00..44.34 rows=8 width=40) (actual time=0.024..0.027 rows=1 loops=1)
                                            Buffers: shared hit=1
                                            ->  Seq Scan on table_one_17955_2004312" table_one  (cost=0.00..22.15 rows=4 width=40) (actual time=0.024..0.024 rows=1 loops=1)
                                                  Output: table_one."time", table_one.user_id
                                                  Filter: ((table_one."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_one."time" < '2019-03-01 11:00:00+03'::timestamp with time zone))
                                                  Buffers: shared hit=1
                                            ->  Seq Scan on table_one_17956_2005560" table_one_1  (cost=0.00..22.15 rows=4 width=40) (actual time=0.002..0.002 rows=0 loops=1)
                                                  Output: table_one_1."time", table_one_1.user_id
                                                  Filter: ((table_one_1."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_one_1."time" < '2019-03-01 11:00:00+03'::timestamp with time zone))
                                      ->  Hash  (cost=44.34..44.34 rows=8 width=40) (actual time=0.044..0.044 rows=25 loops=1)
                                            Output: table_two.user_id, table_two."time"
                                            Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                            Buffers: shared hit=2
                                            ->  Append  (cost=0.00..44.34 rows=8 width=40) (actual time=0.018..0.030 rows=25 loops=1)
                                                  Buffers: shared hit=2
                                                  ->  Seq Scan on table_two_17955_2003480" table_two  (cost=0.00..22.15 rows=4 width=40) (actual time=0.018..0.023 rows=24 loops=1)
                                                        Output: table_two.user_id, table_two."time"
                                                        Filter: ((table_two."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_two."time" < '2019-03-02 11:00:00+03'::timestamp with time zone))
                                                        Buffers: shared hit=1
                                                  ->  Seq Scan on table_two_17956_2005304" table_two_1  (cost=0.00..22.15 rows=4 width=40) (actual time=0.004..0.004 rows=1 loops=1)
                                                        Output: table_two_1.user_id, table_two_1."time"
                                                        Filter: ((table_two_1."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_two_1."time" < '2019-03-02 11:00:00+03'::timestamp with time zone))
                                                        Buffers: shared hit=1
                        Planning Time: 41.035 ms
                        Execution Time: 0.448 ms
Planning Time: 1.846 ms
Execution Time: 760.663 ms

Plan when grouping by table_one_day and period

GroupAggregate  (cost=0.00..0.00 rows=0 width=0) (actual time=46028.822..46028.825 rows=3 loops=1)
  Output: remote_scan.first_ev_day_trunc, remote_scan.period, count(DISTINCT remote_scan.count)
  Group Key: remote_scan.first_ev_day_trunc, remote_scan.period
  Buffers: shared hit=3
  ->  Sort  (cost=0.00..0.00 rows=0 width=0) (actual time=46028.804..46028.804 rows=7 loops=1)
        Output: remote_scan.first_ev_day_trunc, remote_scan.period, remote_scan.count
        Sort Key: remote_scan.first_ev_day_trunc, remote_scan.period
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=3
        ->  Custom Scan (Citus Real-Time)  (cost=0.00..0.00 rows=0 width=0) (actual time=46028.786..46028.788 rows=7 loops=1)
              Output: remote_scan.first_ev_day_trunc, remote_scan.period, remote_scan.count
              Task Count: 32
              Tasks Shown: One of 32
              ->  Task
                    Node: host=94.130.157.249 port=5432 dbname=klonemobile
                    ->  Group  (cost=89.29..89.59 rows=8 width=48) (actual time=0.379..0.384 rows=2 loops=1)
                          Output: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), (date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))))), table_two.user_id
                          Group Key: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), (date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))))), table_two.user_id
                          Buffers: shared hit=12
                          ->  Sort  (cost=89.29..89.31 rows=8 width=48) (actual time=0.378..0.379 rows=24 loops=1)
                                Output: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), (date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))))), table_two.user_id
                                Sort Key: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), (date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))))), table_two.user_id
                                Sort Method: quicksort  Memory: 26kB
                                Buffers: shared hit=12
                                ->  Hash Left Join  (cost=44.44..89.17 rows=8 width=48) (actual time=0.284..0.337 rows=24 loops=1)
                                      Output: timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time"))), date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time"))))), table_two.user_id
                                      Hash Cond: ((table_one.user_id)::text = (table_two.user_id)::text)
                                      Join Filter: ((table_one."time" < table_two."time") AND ((table_one."time" + '2 days'::interval day to second) >= table_two."time"))
                                      Rows Removed by Join Filter: 1
                                      Buffers: shared hit=3
                                      ->  Append  (cost=0.00..44.34 rows=8 width=40) (actual time=0.026..0.029 rows=1 loops=1)
                                            Buffers: shared hit=1
                                            ->  Seq Scan on table_one_17955_2004312 table_one  (cost=0.00..22.15 rows=4 width=40) (actual time=0.025..0.026 rows=1 loops=1)
                                                  Output: table_one."time", table_one.user_id
                                                  Filter: ((table_one."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_one."time" < '2019-03-01 11:00:00+03'::timestamp with time zone))
                                                  Buffers: shared hit=1
                                            ->  Seq Scan on table_one_17956_2005560 table_one_1  (cost=0.00..22.15 rows=4 width=40) (actual time=0.002..0.002 rows=0 loops=1)
                                                  Output: table_one_1."time", table_one_1.user_id
                                                  Filter: ((table_one_1."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_one_1."time" < '2019-03-01 11:00:00+03'::timestamp with time zone))
                                      ->  Hash  (cost=44.34..44.34 rows=8 width=40) (actual time=0.026..0.026 rows=25 loops=1)
                                            Output: table_two."time", table_two.user_id
                                            Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                            Buffers: shared hit=2
                                            ->  Append  (cost=0.00..44.34 rows=8 width=40) (actual time=0.011..0.019 rows=25 loops=1)
                                                  Buffers: shared hit=2
                                                  ->  Seq Scan on "table_two_17955_2003480" table_two  (cost=0.00..22.15 rows=4 width=40) (actual time=0.011..0.014 rows=24 loops=1)
                                                        Output: table_two."time", table_two.user_id
                                                        Filter: ((table_two."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_two."time" < '2019-03-02 11:00:00+03'::timestamp with time zone))
                                                        Buffers: shared hit=1
                                                  ->  Seq Scan on table_two_17956_2005304 table_two_1  (cost=0.00..22.15 rows=4 width=40) (actual time=0.003..0.003 rows=1 loops=1)
                                                        Output: table_two_1."time", table_two_1.user_id
                                                        Filter: ((table_two_1."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_two_1."time" < '2019-03-02 11:00:00+03'::timestamp with time zone))
                                                        Buffers: shared hit=1
                        Planning Time: 5899.378 ms
                        Execution Time: 0.531 ms
Planning Time: 2.757 ms
Execution Time: 46028.896 ms

  • 1
    @GordonLinoff > Joining on expressions Join is actually pretty straightforward and uses only user_ids, which are indexed. It is `GROUP BY period` part of the query performs bad. – Ivan Vyazmitinov Apr 01 '19 at 16:43
  • 1
    Could you share `EXPLAIN (ANALYZE, BUFFERS)` output for the query? Does increasing `work_mem` help? – Laurenz Albe Apr 01 '19 at 17:06
  • 1
    Have you tried an index on `(user_id, date_trunc('DAY', "ttz" AT TIME ZONE 'America/Los_Angeles'))` on at least `table_one` or maybe both? – sticky bit Apr 01 '19 at 18:47
  • @LaurenzAlbe Added plans and more context Info. As you can see from plans, memory is not the issue. – Ivan Vyazmitinov Apr 02 '19 at 09:24

1 Answers1

-1

How many columns are there in table_one in reality? i.e. are there really only two columns? If it is a wide table then you can create an index on that table on user_id, ttz. This will make it so that database will be able to scan a smaller data structure, i.e. an index vs a larger structure i.e. a table.

If it is still the case that it is slow, then some databases like oracle allow for an expression when creating an index. Mysql allows similar functionality as virtual columns that are not stored, i.e. http://mysqlserverteam.com/generated-columns-in-mysql-5-7-5/ And see https://mysqlserverteam.com/virtual-columns-and-effective-functional-indexes-in-innodb/

Saad Ahmad
  • 393
  • 1
  • 7