1

I have to fetch record from two tables, there is one table is hyper table another table is normal table.

Hyper table primary key (a UUID, not a timestampz column) is used as foreign key in 2nd normal table.

The hyper table has one to many relationship with the normal table.

Will I get all benefits of hyper table here if I select record after joining this table?

I am using postgresql database for timescale.

Below are create table queries for same. The demography_person is the hypertable and the emotions_person is the normal table

CREATE TABLE public.demography_person
(
  start_timestamp timestamp with time zone NOT NULL,
  end_timestamp timestamp with time zone,
  demography_person_id character varying NOT NULL,
  device_id bigint,
  age_actual numeric,
  age_band integer,
  gender integer,
  dwell_time_in_millis bigint,
  customer_id bigint NOT NULL
);

SELECT create_hypertable('demography_person', 'start_timestamp');

CREATE TABLE public.emotions_person
(
  emotion_start_timestamp timestamp with time zone NOT NULL,
  demography_person_id character varying NOT NULL,
  count integer,
  emotion integer,
  emotion_percentage numeric
);

select sql Query is like:-

SELECT * FROM crosstab
             (
               $$
                   SELECT * FROM  ( select  to_char(dur,'HH24') as duration , dur as time_for_sorting from 
                generate_series(
            timestamp '2019-04-01 00:00:00',
            timestamp '2020-03-09 23:59:59' ,
            interval  '1 hour'
                ) as dur   ) d
                   LEFT JOIN (  
                   select to_char(
                                    start_timestamp ,
                                   'HH24'
                                   )
                   as duration,
                   emotion,count(*) as count from demography_person dp INNER JOIN (
            select  distinct ON (demography_person_id)  demography_person_id, emotion_start_timestamp,count,emotion,emotion_percentage,
            (CASE emotion when 4 THEN 1 when 6  THEN 2 when 1 THEN 3  WHEN 3 THEN 4 WHEN 2 THEN 5  when 7 THEN 6  when 5 THEN 7  ELSE 8 END )  
             as emotion_key_for_sorting from emotions_person  where    demography_person_id in (select demography_person_id from demography_person where start_timestamp >= '2019-04-01 00:00:00'
            AND start_timestamp <= '2020-03-09 23:59:59' AND device_id IN ( 2052,2692,1797,2695,1928,2697,2698,1931,2574,2575,2706,1942,1944,2713,1821,2719,2720,2721,2722,2723,2596,2725,2217,2603,1852,2750,1726,1727,2754,2757,1990,2759,2760,2376,2761,2762,2257,2777,2394,2651,2652,1761,2658,1762,2659,2788,2022,2791,2666,1770,2026,2028,2797,2675,1780,2549 ))   
               order by demography_person_id asc,emotion_percentage desc, emotion_key_for_sorting asc 
                   ) ep ON
                   ep.demography_person_id = dp.demography_person_id
                   WHERE start_timestamp >= '2019-04-01 00:00:00'
AND start_timestamp <= '2020-03-09 23:59:59' AND device_id IN ( 2052,2692,1797,2695,1928,2697,2698,1931,2574,2575,2706,1942,1944,2713,1821,2719,2720,2721,2722,2723,2596,2725,2217,2603,1852,2750,1726,1727,2754,2757,1990,2759,2760,2376,2761,2762,2257,2777,2394,2651,2652,1761,2658,1762,2659,2788,2022,2791,2666,1770,2026,2028,2797,2675,1780,2549 ) AND gender IN ( 1,2 )
                   group by 1,2 ORDER  BY 1,2 ASC
                   ) t USING (duration) GROUP  BY 1,2,3,4 ORDER  BY time_for_sorting;           
               $$ ,
               $$
                 select emotion from (
                                                          values ('1'), ('2'), ('3'),('4'), ('5'), ('6'),('7'), ('8')
                                                    ) t(emotion)
              $$ 
         ) AS ct 
              (
                   duration text,
                   time_for_sorting  timestamp,
                  ANGER bigInt,
                  DISGUSTING bigInt,
                  FEAR bigInt,
                  HAPPY bigInt,
                  NEUTRAL bigInt,
                  SAD bigInt,
                  SURPRISE bigInt,
                  NO_DETECTION bigInt
             ); 
  • I don't get what you mean with benefits on join result. However, you cannot reference hypertable from other tables FKs. So the proposed design will not work. – k_rus Mar 11 '20 at 08:23
  • Actually, since you are using primary key of hypertable, it might work to create FK to it. However, the primary key is the time dimension column. Can you improve the description of the question and describe the essential part of the hypertable schema? – k_rus Mar 11 '20 at 08:31
  • Note you don't need to define FK to be able to join two tables. – k_rus Mar 11 '20 at 08:32
  • i have updated description for better understanding. – Sanjeev Jha Mar 11 '20 at 09:58
  • Thank you for adding the schemas to the description. It helps. Can you clean up your description, so it matches the schema? You mentioned hypertable primary key, which UUID, but I think you don't plan for this any more, right? – k_rus Mar 11 '20 at 12:14
  • Can you write SQL example of your query that you want to do? – k_rus Mar 11 '20 at 12:15
  • Thanks for your comment. i have added select query. – Sanjeev Jha Mar 11 '20 at 13:12
  • @k_rus hi please have a look. – Sanjeev Jha Mar 24 '20 at 11:11
  • Thank you for reminding me. I will try to look tomorrow. – k_rus Mar 24 '20 at 16:25

1 Answers1

1

Will i get benefits of hyper table if I have a query in which I join a hyper table with a normal (non-hyper) table in timescaledb

I don't fully understand the question and see 2 interpretations for it:

  1. Will I benefit from using TimescaleDB and hypertable just for improving this query?
  2. Can I join a hypertable and a normal table and how to make the above query to perform better?

If you just need to execute a complex query over large dataset, PostgreSQL can do good job if you provide indexes. TimescaleDB provides benefits for Timeseries workflows especially when a workflow includes data in-order ingesting, time-related queries, timeseries operators and/or usage TimescaleDB specific functionality such as continuous aggregates and compression, i.e., not just a query. TimescaleDB is designed for large volumes of timeseries data. I hope it clarifies the first question.

In TimescaleDB it is very common to join hypertable, which stores timeseries data, and a normal table, which contains metadata on timerseries data. TimescaleDB implements constraint exclusion to improve query performance. However, it might not be applied in some cases due to uncommon query expressions or too complex queries.

The query in the question is very complex. So I suggest to use ANALYZE on the query to see if the query planner misses some optimisations.

I see that the query generates data and I doubt it can be done much to produce good query plan. So this is my biggest concern for getting good performance. It would be great if you can explain motivation around the generating data inside the query.

Another issue, which I see, is a nested query demography_person_id in (select demography_person_id from demography_person ... in a where condition. And the outer query is a part in a inner join with the same table as in the nested query. I expect it can be rewritten without nested subquery utilising inner join.

I doubt that TimescaleDB or PostgreSQL can do much to execute query efficiently. The query requires manual human rewriting.

k_rus
  • 2,959
  • 1
  • 19
  • 31