-1

I am optimizing a query using materialized view in PostgreSQL but query logic is not working in mat view

I want to optimize the query which involves multiple joins and its execution time also higher, so I tried the same query in Materialized view using PostgreSQL but query logic is going wrong when it comes to mat view.

I have tried this mat view creation in PostgreSQL 11.

In the below code, three key tables

1.posts 
2.Topics 
3.Post_topics.

Post_topics tables is holding post_id and topic_id.

Topics tables is list of topics (each topic is having multiple values;

example if topic is egg, values associated with egg were 'breakfast','dinner','cheese' etc)

and posts tables is holding posts relating the topics. Each topic may have multiple posts.

I want to get the count from topic table which holds values. So how many counts for breakfast, dinner, cheese when topic id is egg. Topics table is holding 8000 where breakfast,dinner, cheese were also in the value list. If I enter cheese the count for egg, breakfast, cheese should come. In the Normal query I have done this but in mat view, I am struggling to get this logic.

Original Query:

SELECT t.id as topic_id, t.value as value, t.topic as topic, COUNT(c.topic_id) as count 
FROM post_topics a 
JOIN posts b ON a.post_id = b.id 
JOIN post_topics c on a.post_id = c.post_id 
--JOIN post_locations pl ON pl.post_id = c.post_id 
JOIN topics t on c.topic_id = t.id --AND t.topic = 'cuisine' 
WHERE a.topic_id = '1234547hnunsdrinfs' 
AND t.id != '1234547hnunsdrinfs' 
AND b.date_posted BETWEEN ('2019-06-17'::date - interval '6 month') AND '2019-06-17'::date 
     GROUP BY t.id, c.topic_id 
     ORDER BY count DESC 
     LIMIT 20

I have edited the mat view query with topic id list (Topic table is having 8000 values). In the original query I have mentioned just 1 topic id but I need result for entire 8000. For the below mat view I am getting top topics for each id but counts were not matched.

**Edited Mat View
      Create materialized view top_associations_mv AS
        SELECT a.topic_id, t.id as topic_id, t.value as value, t.topic as topic, COUNT(c.topic_id) as count 
        FROM post_topics a 
        JOIN posts b ON a.post_id = b.id 
        JOIN post_topics c on a.post_id = c.post_id 
        JOIN topics t on c.topic_id = t.id 
        WHERE a.topic_id in ('c108200f-e4dc-415e-9150-3f6c74b879e2', '107f8cad-75b3-43fb-9b2f-f7914bf45155') -- here all 8000 topic id should be placed
        --AND t.id <>  ( 'c108200f-e4dc-415e-9150-3f6c74b879e2', '107f8cad-75b3-43fb-9b2f-f7914bf45155')--,'c348af9d-dd98-49f1-b6c2-8ea36b404ffa')
         and (b.date_posted > (('now'::text)::date - '6 mons'::interval))
        GROUP BY t.id, c.topic_id,a.topic_id 
        ORDER BY count DESC limit 10 ;

Materialized View:

Create MATERIALIZED VIEW top_associations_mv as 
SELECT t.id as topic_id, t.value as value, t.topic as topic, COUNT(c.topic_id) as count 
FROM post_topics a 
JOIN posts b ON a.post_id = b.id 
JOIN post_topics c on a.post_id = c.post_id
JOIN topics t on c.topic_id = t.id 
WHERE t.id != c.post_id and (b.date_posted > (('now'::text)::date - '6mons'::interval)) 
GROUP BY t.id, c.topic_id 
ORDER BY count DESC

My expected result is:

I want to get the counts from topic table which holds values. So how many counts for breakfast, dinner, cheese when topic id is egg. But in actual result count is wrong.

I really need a help please!!!

prabhu
  • 103
  • 1
  • 3
  • 15

2 Answers2

0

You should use subqueries on join and also subqueries should contain distinct on clause. You can use it:

create materialized view top_associations_mv as select
    t.id as topic_id,
    t.value as value,
    t.topic as topic,
    count(c.topic_id) as count
from
    post_topics a
join (
    select
        distinct on
        (id) id,
        date_posted
    from
        posts ) as b on
    a.post_id = b.id
and
    (b.date_posted > (('now'::text)::date - '6mons'::interval)) (
    select
        distinct on
        (post_id) topic_id
    from
        post_topics ) as c on
    a.post_id = c.post_id (
    select
        distinct on
        (id) id
    from
        topics ) as t on
    c.topic_id = t.id
    and t.id != c.post_id
group by
    t.id,
    c.topic_id
order by
    count desc
  • Hi, that's where I lost, can you please help me on this. when it comes to mat view - the counts going wrong and I applied same join without specific topic id here from base query. – prabhu Jul 26 '19 at 14:48
  • Your original query is not the same with mat view. You identify an id in your original query but in mat view, you join a table. So, there may be duplicates values in your view. – Mustafa BYKSY Jul 29 '19 at 08:10
  • Yes, I have to create mat view for all topic id instead of one. the original query execution time is very high so I am going for mat view to create the counts for each topic based on original query but mat view gives me the over all count. I dont know how to create that such big table. – prabhu Jul 29 '19 at 08:19
  • I edit your query in my answer. You can use it. Your performance is bad, please check your indexes. – Mustafa BYKSY Jul 29 '19 at 08:32
  • That's right. My original query performance is too bad, the execution time is 8 sec. Thats why I am trying mat view to create single entry table where I can query for particular topic id – prabhu Jul 29 '19 at 08:57
  • I edit my answer again for performance optimization. Is it working? – Mustafa BYKSY Jul 29 '19 at 09:02
  • Actually query throwing error in Where clause (first select statement) – prabhu Jul 29 '19 at 09:16
  • Remove where clause and add and clause. Try again please. – Mustafa BYKSY Jul 29 '19 at 10:32
  • Its still the same error. Is there any other way to optimize the original query because thats my goal. I am using multiple joins so query taking 10s to execute. – prabhu Jul 29 '19 at 10:38
  • You may remove `between` and use `>` an `<` on date parameters. Increase your `work_mem` to 100MB or maybe more than it, in PostgreSQL configuration. Create indexes on **topic_id, t.id and date_posted**. – Mustafa BYKSY Jul 29 '19 at 10:44
  • Yes I did that already but still I am facing performance issues. – prabhu Jul 29 '19 at 10:53
0

I fixed the query. Its just an another join with topics table.

SELECT a.topic_id as main_topic, t.id as topicid, t.value as value, t.topic as topic, COUNT(c.topic_id) as count
FROM post_topics a
JOIN posts b ON a.post_id = b.id
JOIN post_topics c on a.post_id = c.post_id
JOIN topics t on c.topic_id = t.id
**JOIN topics t2 on t2.id = a.topic_id and (t.id) <> (t2.id)**
WHERE (b.date_posted > (('now'::text)::date - '6 mons'::interval))
AND LOWER(b.source) = 'instagram'
GROUP BY t.id, c.topic_id,a.topic_id
ORDER BY c.topic_id,count DESC;
prabhu
  • 103
  • 1
  • 3
  • 15