-1

I have query that is becoming very huge due to multiple UNION ALL can anyone help me reduce it. There are more than 300 UNION ALL actually.

SELECT  keywords, 
        COUNT(i.postId) as Posts, 
        SUM(i.interactions) as Interactions, 
        GROUP_CONCAT(DISTINCT i.users) as Users 
FROM (
    SELECT 'keyword1' as keywords, 
            `postId`, 
            `interactions`, 
            ( SELECT displayName FROM profile WHERE id=userID LIMIT 1 ) as users
    FROM `posts` 
    WHERE `content` REGEXP 'keyword1' 
        AND created BETWEEN '2021-01-01' AND '2021-02-19' 
        AND userID IN (147483,166451,166467)                                                                                    
    UNION ALL 
        SELECT 'keyword2' as keywords, 
                `postId`, 
                `interactions`, 
                ( SELECT displayName FROM profile WHERE id=userID LIMIT 1 ) as users 
        FROM `posts` 
        WHERE `content` REGEXP 'keyword2' 
            AND created BETWEEN '2021-01-01' AND '2021-02-19' 
            AND userID IN (147483,166451,166467)                                                                                    
    UNION ALL 
        SELECT 'keyword3' as keywords, 
                `postId`, 
                `interactions`, 
                ( SELECT displayName FROM profile WHERE id=userID LIMIT 1 ) as users 
        FROM `posts` 
        WHERE `content` REGEXP 'keyword3' 
            AND created BETWEEN '2021-01-01' AND '2021-02-19' 
            AND userID IN (147483,166451,166467)                                                                                    
    ) i 
GROUP BY keywords
O. Jones
  • 103,626
  • 17
  • 118
  • 172
Rajnish
  • 13
  • 2
  • 2
    Fix your data model, so you are storing the values in rows rather than clumns. – Gordon Linoff Feb 22 '21 at 16:59
  • 2
    A perfect example of what a bad data structure can do to your code. – Your Common Sense Feb 22 '21 at 17:00
  • what does `select version();` show? – ysth Feb 22 '21 at 17:04
  • 1
    I'm not understanding the criticisms; this doesn't seem to be a case of rows rather than columns; this is just trying to match multiple keywords and report which matched and how often. @GordonLinoff can you explain? maybe I'm just overlooking whatever you are seeing – ysth Feb 22 '21 at 17:06
  • Can you show us your data model and explain what you want to do with this query? – Francisco Cabral Feb 22 '21 at 17:14
  • Please [edit] your question to show us the definition of your tables, and some samples of the values of your `content` columns. Is `id` the primary key of the `profile` table? And you *must* get rid of all these UNION ALL operations unless you want to measure your throughput in queries per week. – O. Jones Feb 22 '21 at 17:38

1 Answers1

2

This is a situation where it helps to remember that SQL is a declarative, not a procedural, language. You describe what you want.

It seems you have a list of 300 keywords you wish to use to summarize your content column.

Let's say you have those keywords stored in a table called keywords.

And, let's say your profile table's primary key is id, matching posts.userId. That means we can retrieve users' displayName values with a JOIN rather than a mess of subqueries.

Then you can write your query like this. We'll start with this subquery to retrieve the details of the rows matching your keywords.

SELECT keywords.keyword, 
       posts.content, posts.postId, 
       posts.interactions, 
       profile.displayName
  FROM posts
  JOIN profile ON posts.userId = profile.id
  JOIN keywords ON post.content RLIKE keywords.keyword 
 WHERE posts.created BETWEEN '2021-01-01' AND '2021-02-19'
   AND posts.userID in (147483,166451,166467)

This generates a virtual table containing the posts you want to summarize. You should test this and convince yourself it's correct.

Then you summarize them by changing around the query's SELECT and adding a GROUP BY, Like this.

SELECT keywords.keyword, 
       COUNT(*) count,
       SUM(posts.interactions) interactions,
       GROUP_CONCAT(DISTINCT profile.displayName ORDER BY profile.displayName) users
  FROM posts
  JOIN profile ON posts.userId = profile.id
  JOIN keywords ON post.content RLIKE keywords.keyword 
 WHERE posts.created BETWEEN '2021-01-01' AND '2021-02-19'
   AND posts.userID in (147483,166451,166467)
 GROUP BY keywords.keyword;

And you're done. Moving the list of keywords to their own table is the secret to getting rid of the huge UNION ALL cascade.

You may find this line to be slightly faster than using a regular expression.

  JOIN keywords ON post.content LIKE CONCAT('%', keywords.keyword, '%') 

Finally, if posts.created is a DATETIME or TIMESTAMP, you want this instead of created BETWEEN so you get all items for the last day of your range. Note the < for the end of the range.

WHERE posts.created >= '2021-01-01' 
  AND posts.created < '2021-02-19' + INTERVAL 1 DAY

You need to use this criterion for the end of a date range because the date constant 2021-02-19 actually means 2021-02-19 00:00:00, or midnight at the beginning of that date. A timestamp of, for example, 2021-02-19 10:22:00 is after that midnight, so BETWEEN won't do what you want.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Hi O. Jones. I really like the details you've given in your answer and gave a +1. However, I was always under the impression that between was inclusive of both the beginning and ending values. Is that not correct? – Doyle B Feb 23 '21 at 00:57
  • 2
    Please read the paragraph I added at the end of my answer. – O. Jones Feb 23 '21 at 02:27
  • Thanks! I've never caught that before and always assumed it was more like left(date_field,10) for the ending date. – Doyle B Feb 23 '21 at 03:39