1

I have a query I'm using in my CodeIgniter model to fetch the count of listings of products between particular days. This works fine when there are less items in my table, but there are more than 100,000 entries in my table and to just get the output of 2 days it takes around 3-4 minutes. The longer the from and to days are apart, the more time it takes.

Here is the query: (Dbfiddle:https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=e7a99f08ecd217cbeb09fe6676cfe645)

with Y as (
  with recursive D (n, day) as (
    select 1 as n, '2021-09-25' my_date
    union
    select n+1, day + interval 1 day from D
      where day + interval 1 day < '2021-10-15'
  ) select * from D
), X as (
  select Y.day,
         l.*,
         (select status_from from logs
            where logs.refno = l.refno
              and logs.logtime >= Y.day
            order by logs.logtime
            limit 1) logstat
    from listings l, Y
    where l.added_date <= Y.day
), Z as (
  select X.day, ifnull(X.logstat,X.status) stat_day, count(*) cnt
    from X
    group by X.day, stat_day
)
select Z.day,
  sum(case when Z.stat_day = 'D' then Z.cnt else 0 end ) Draft,
  sum(case when Z.stat_day = 'A' then Z.cnt else 0 end ) Action,
  sum(case when Z.stat_day = 'Y' then Z.cnt else 0 end ) Publish,
  sum(case when Z.stat_day = 'S' then Z.cnt else 0 end ) Sold,
  sum(case when Z.stat_day = 'L' then Z.cnt else 0 end ) Let
  from Z
  group by Z.day
  order by Z.day;

Basically what this query is doing is status_from from logs where the date is during and after the selected date range and taking added_date from listings where the date falls before the from date range picked by the user and calculates it. Once it has retrieved those records, it checks the table for what variable that status holds and does a sum(case when else 0) to get the total count.

One reason I'm thinking the query is slow is because it has to compute the sum for the statuses in the query itself, so maybe it would be faster to do the count part in the php side? If so then how can I create a statement for it to iterate the count in my view class.

Current View Class:

<?php
            foreach($data_total as $row ){
               $draft = $row->draft ? $row->draft : 0;
               $publish = $row->publish ? $row->publish : 0;
               $action = $row->action ? $row->action : 0;
               $sold = $row->sold ? $row->sold : 0;
               $let = $row->let ? $row->let : 0;                              
          ?>
              <tr>
                    <td><?= $row->day?></td>
                    <td><?= $draft ?></td>
                    <td><?= $publish ?></td>
                    <td><?= $action ?></td>
                    <td><?= $sold ?></td>
                    <td><?= $let ?></td>
              </tr>
          <?php }  ?>

Or if possible if would there be any way to get the same output of this query but in a faster way.

Rick James
  • 135,179
  • 13
  • 127
  • 222
JJM50
  • 467
  • 1
  • 7
  • 20
  • 1
    You need to include the results of explain for your query, table definitions - including indexes - for all affected tables in the query. Without these, it is pretty difficult to answer your question! – Shadow Oct 22 '21 at 17:07
  • 1
    My first guess is to throw indexes on the two TIMESTAMP columns that are filtered and/or sorted: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=fd0aa64d56f72b5592826ee4f9da0292 – Chris Haas Oct 22 '21 at 17:10
  • I have updated my answer – Peter Trcka Oct 22 '21 at 19:09
  • 1
    **You need to show us the table and index definitions**, as well as row counts for each of the tables. Maybe your tables are defined poorly. Maybe the indexes aren't created correctly. Maybe you don't have an index on that column you thought you did. Without seeing the table and index definitions, we can't tell. We need row counts because that can affect query planning. If you know how to do an `EXPLAIN` or get an execution plan, put the results in the question as well. If you have no indexes, visit http://use-the-index-luke.com. – Andy Lester Oct 22 '21 at 19:56
  • Please provide `EXPLAIN` for it. – Rick James Oct 22 '21 at 22:50
  • Don't fetch `l.*` unless you really need all the columns. It seems you need only a couple of columns. – Rick James Oct 22 '21 at 22:51
  • `logstat` smells like a "groupwise max" problem -- but inefficiently written. – Rick James Oct 22 '21 at 22:52
  • 1
    Start with `ALTER TABLE listings ADD INDEX added_date (added_date)`. Then do EXPLAIN or EXPLAIN ANALYZE on your system with your real data (not the tiny subset on the dbfiddle) and show us the output. – O. Jones Oct 23 '21 at 13:57
  • Does my new code improve your performance or not? Have you already tried it? – Peter Trcka Oct 23 '21 at 19:10
  • 1
    @PeterTrcka yes I did try it and it still takes 3-4 minutes to return 2 entries – JJM50 Oct 24 '21 at 04:11
  • @JJM50 Does my query improve your time execution? could you comment something about it? – nachospiu Oct 27 '21 at 12:10

3 Answers3

1

If your final output is going to be on a website, a snapshot of data is normally better practice than a live feed for Past Activities. I have used in the past a stored procedures to update a table daily with Past Activites, then use a view to Select Past_Activities unioned to Current_Activities to decrease load time for my viewers.

  • So as of now I can't change it since it already has stored the data for the previous years, so I need a way to extract that data in a faster way. Moving forward I can use this suggestion, but for now I need a way to make this query faster – JJM50 Oct 22 '21 at 17:21
1

Is this faster? If you are calling query more frequently you can consider save ROW_NUMBER to logs table

with calendar as (
with recursive cal (n, day) as (
    select 1 as n, '2021-09-25' my_date
    union
    select n+1, day + interval 1 day from cal
    where day + interval 1 day < '2021-10-15'
    )select * from cal
), loggs as (
    select
         ROW_NUMBER() OVER (partition by refno order by logtime) as RN
        ,status_from as logstat
        ,refno
        ,logtime
    from logs
),X as (
  select cal.day,
         l.*,
         logs.logstat,
         RN,
         min(RN) over (partition by l.refno, cal.day) as RN_MIN
    from listings l
    join calendar as cal on l.added_date <= cal.day
    left join loggs as logs on logs.refno = l.refno and logs.logtime >= cal.day
), Z as (
  select X.day, ifnull(X.logstat,X.status) stat_day, count(*) cnt
    from X
    where ifnull(RN, 0) = ifnull(RN_min, 0)
    group by X.day, stat_day
)
select Z.day,
  sum(case when Z.stat_day = 'D' then Z.cnt else 0 end ) Draft,
  sum(case when Z.stat_day = 'A' then Z.cnt else 0 end ) Action,
  sum(case when Z.stat_day = 'Y' then Z.cnt else 0 end ) Publish,
  sum(case when Z.stat_day = 'S' then Z.cnt else 0 end ) Sold,
  sum(case when Z.stat_day = 'L' then Z.cnt else 0 end ) Let
  from Z
  group by Z.day
  order by Z.day;
Peter Trcka
  • 1,279
  • 1
  • 16
  • 21
  • Hey would it be possible if you could show this to me in the dbfiddle provided, since I tried it in the fiddle and it gave some errors https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c8d968e5aad2002a03e5457034f809f1 – JJM50 Oct 22 '21 at 17:36
  • Works for me. [fiddle](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=685a9fb51c5d104b086dfadcb19b066d). – O. Jones Oct 23 '21 at 14:08
0

I simplified your query, but I'm not sure you'll get a significant improvement in time execution. You must define suitable indexes.

Please check it carefully and make sure its output is right.

WITH RECURSIVE 
  cal AS (SELECT '2021-09-25' AS day
    
          UNION ALL
    
          SELECT day + interval 1 day 
          FROM cal
          WHERE day + interval 1 day < '2021-10-15'),
  
  X AS (SELECT DISTINCT
                  cal.day,
                  l.id,
                  l.status,
                  FIRST_VALUE(status_from) OVER (PARTITION BY logs.refno, cal.day ORDER BY logs.logtime) AS logstat
        FROM listings l
        INNER JOIN cal ON l.added_date <= cal.day
        LEFT JOIN logs ON logs.refno = l.refno AND logs.logtime >= cal.day)

SELECT X.day,
       COUNT(CASE WHEN IFNULL(X.logstat, X.status) = 'D' THEN 1 END) Draft,
       COUNT(CASE WHEN IFNULL(X.logstat, X.status) = 'A' THEN 1 END) Action,
       COUNT(CASE WHEN IFNULL(X.logstat, X.status) = 'Y' THEN 1 END) Publish,
       COUNT(CASE WHEN IFNULL(X.logstat, X.status) = 'S' THEN 1 END) Sold,
       COUNT(CASE WHEN IFNULL(X.logstat, X.status) = 'L' THEN 1 END) Let
FROM X
GROUP BY X.day
ORDER BY X.day;
nachospiu
  • 2,009
  • 2
  • 8
  • 12
  • This did not improve the execution time, but I believe the problem is with me not creating indexes for listings table. – JJM50 Oct 27 '21 at 12:33
  • Thank you @JJM50!!! I think indexes can help you. The query generates many rows, maybe you can restrict them in some way. – nachospiu Oct 27 '21 at 12:51
  • Okay so now suppose I make a index with added_date by using `ALTER TABLE listings ADD INDEX added_date (added_date)`, how can I access that index in this query? – JJM50 Oct 27 '21 at 13:06
  • 1
    `MySql` will use it when it deems appropriate, but you can force the use of an index too. You can reed [index](https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html) and [index-hints](https://dev.mysql.com/doc/refman/5.6/en/index-hints.html). – nachospiu Oct 27 '21 at 13:18