0

I'm currently having more than 400k records in my mysql table. The structure is the following:

table structure

The function i am using:

function cron_hour_counts()
{
    $subids = get_subids();
    array_push($subids, '');
    $from = '2011-10-20';//last_updated_date('tb_hour_counts');
    $to = '2011-10-20';//last_date();
    $days = days_interval($from, $to);
    $result_array = array();

    foreach ($subids as $subid)
    {
        for ($i = 0; $i < $days; $i++)
        {
            $hour = '00:00';
            for ($t = 0; $t <= 23; $t++)
            {
                if ($t == 0)
                {
                    $chour = date('H:i', strtotime($hour . '+' . $t . 'hour'));
                    $phour = date('H:i', strtotime('23:59'));

                    $date = date('Y-m-d', strtotime($from . '+' . $i . 'day'));
                    $date_prev = date('Y-m-d', strtotime($date . '- 1 day'));
                }
                else
                {
                    $chour = date('H:i', strtotime($hour . '+' . $t . 'hour'));
                    $phour = date('H:i', strtotime($chour . '-1 hour'));

                    $date = date('Y-m-d', strtotime($from . '+' . $i . 'day'));
                    $date_prev = $date;
                }

                $unique_id_query = mysql_query("SELECT (SELECT COUNT(DISTINCT `id`,`subid`) FROM `tb_stats` WHERE (`date` < '" . mysql_real_escape_string($date) . "') OR (`date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "')" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") - (SELECT COUNT(DISTINCT `id`,`subid`) FROM `tb_stats` WHERE (`date` < '" . mysql_real_escape_string($date_prev) . "') OR (`date` = '" . mysql_real_escape_string($date_prev) . "' AND `time` <= '" . mysql_real_escape_string($phour) . "')" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") AS `unique_ids`");
                $unique_id_result = mysql_fetch_assoc($unique_id_query);

                $total_id_query = mysql_query("SELECT COUNT(DISTINCT `id`,`subid`) AS `total_ids` FROM `tb_stats` WHERE `date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : ""));
                $total_id_result = mysql_fetch_assoc($total_id_query);

                $unique_ip_query = mysql_query("SELECT (SELECT COUNT(DISTINCT `ip`,`subid`) FROM `tb_stats` WHERE (`date` < '" . mysql_real_escape_string($date) . "') OR (`date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "')" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") - (SELECT COUNT(DISTINCT `ip`,`subid`) FROM `tb_stats` WHERE `date` <= '" . mysql_real_escape_string($date_prev) . "' AND `time` <= '" . mysql_real_escape_string($phour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") AS `unique_ips`");
                $unique_ip_result = mysql_fetch_assoc($unique_ip_query);

                $total_ip_query = mysql_query("SELECT COUNT(DISTINCT `ip`,`subid`) AS `total_ips` FROM `tb_stats` WHERE `date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : ""));
                $total_ip_result = mysql_fetch_assoc($total_ip_query);

                $global_query = mysql_query("SELECT COUNT(`id`) AS `global` FROM `tb_stats` WHERE `date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : ""));
                $global_result = mysql_fetch_assoc($global_query);

                $result = array();
                $result['date'] = $date;
                $result['hour'] = $chour;
                $result['subid'] = $subid;
                $result['unique_ids'] = $unique_id_result['unique_ids'];
                $result['total_ids'] = $total_id_result['total_ids'];
                $result['unique_ips'] = $unique_ip_result['unique_ips'];
                $result['total_ips'] = $total_ip_result['total_ips'];
                $result['global'] = $global_result['global'];

                $result_array[] = $result;
            }
        }
    }
    //db insert
    print_r($result_array);
}

Having 20 subids, and one day period it takes 40 minutes to execute. Any tips on speeding this up?

mintobit
  • 2,363
  • 2
  • 15
  • 15
  • Can you explain what you're doing in your script? The readability is very difficult for someone to understand. – Flukey Nov 23 '11 at 17:05
  • 5
    What are your indexes like? If you use it in a `WHERE` clause, there should be an index on it. – ceejayoz Nov 23 '11 at 17:05
  • What is it supposed to be doing? – jprofitt Nov 23 '11 at 17:05
  • explain what you want to do, it should exist an other way to do it and speed it up ... It seems a little bit complicated – Guilhem Hoffmann Nov 23 '11 at 17:09
  • This function will be executed by cron to count new unique subid+id or subid+ip values(unique ids | unique ips) for the every next hour(00:00,'01:00' ...). Also it counts distinct ids and ips for every next hour (total_ids, total_ips). – mintobit Nov 23 '11 at 17:11
  • i'm having an index `search` ON tb_stats (`subid`,`date`,`time`) – mintobit Nov 23 '11 at 17:12
  • Stop using `strtotime`, it's a performance killer. – nickb Nov 23 '11 at 17:13
  • Example of unique_ids_query: SELECT (SELECT COUNT(DISTINCT `id`,`subid`) FROM `tb_stats` WHERE subid = '1' AND `date` <= '2011-10-20') - (SELECT COUNT(DISTINCT `id`,`subid`) FROM `tb_stats` WHERE subid = '1' AND `date` <= '2011-10-19') AS `unique_ids`; – mintobit Nov 23 '11 at 17:16
  • 1
    You really need to look at narrowing the scope of this down. Try commenting out some blocks of code and see where the bottleneck(s) are. – hafichuk Nov 23 '11 at 17:21
  • I believe something could be done with the loops... but i doubt that it will give me much perfomance. I don't have and idea how to simplify the mysql query... – mintobit Nov 23 '11 at 17:23
  • hafichuk, the bottlenecks are 400k rows and the need to count not distinct values for each hour, but new distinct values. – mintobit Nov 23 '11 at 17:27
  • I would grab all the values in one query up front, then find the unique ones in PHP. PHP would be faster than mySQL for that kind of thing, I think. – Jemaclus Nov 23 '11 at 17:33

2 Answers2

1

This is my solution. It works 20 times faster.

function cron_hour_counts()
{
    $subids = get_subids();
    //array_push($subids, '');
    $from = '2011-10-20';//last_updated_date('tb_hour_counts');
    $to = '2011-10-20';//last_date();
    $days = days_interval($from, $to);
    $result_array = array();

    for ($i = 0; $i < $days; $i++)
    {
        $hour = '00:00';
        for ($t = 0; $t <= 23; $t++)
        {
            $date = date('Y-m-d', strtotime($from . '+' . $i . 'day'));
            $currentHour = date('H:i', strtotime($hour . '+' . $t . 'hour'));
            $nextHour = date('H:i', strtotime($currentHour . '+59 minutes'));

            $unique_ids_query = mysql_query("
                SELECT COUNT(id) AS unique_ids,subid
                FROM 
                (
                    SELECT id,subid,date,time
                    FROM tb_stats
                    WHERE date <= '" . mysql_real_escape_string($date) . "'
                    GROUP BY id,subid
                ) AS id_inner
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND time BETWEEN '" . mysql_real_escape_string($currentHour) . "' AND '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid;
            ");
            pull_data('unique_ids', $date, $currentHour, $unique_ids_query, $subids, $result_array);            

            $unique_ips_query = mysql_query("
                SELECT COUNT(ip) AS unique_ips,subid
                FROM 
                (
                    SELECT ip,subid,date,time
                    FROM tb_stats
                    WHERE date <= '" . mysql_real_escape_string($date) . "'
                    GROUP BY ip,subid
                ) AS ip_inner
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND time BETWEEN '" . mysql_real_escape_string($currentHour) . "' AND '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid;
            ");
            pull_data('unique_ips', $date, $currentHour, $unique_ips_query, $subids, $result_array);

            $total_ids_query = mysql_query("
                SELECT COUNT(DISTINCT id,subid) AS total_ids,subid
                FROM tb_stats
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND `time` <= '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid
                ");
            pull_data('total_ids', $date, $currentHour, $total_ids_query, $subids, $result_array);

            $total_ips_query = mysql_query("
                SELECT COUNT(DISTINCT ip,subid) AS total_ips,subid
                FROM tb_stats
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND `time` <= '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid;
            ");
            pull_data('total_ips', $date, $currentHour, $total_ips_query, $subids, $result_array);

            $global_query = mysql_query("
                SELECT COUNT(id) AS global,subid
                FROM tb_stats
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND time <= '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid;
            ");
            pull_data('global', $date, $currentHour, $global_query, $subids, $result_array);
        }
    }

    print_r($result_array);
}
mintobit
  • 2,363
  • 2
  • 15
  • 15
0

Optimize your queries.

Here's an example of a query that can DEFINITELY be optimized:

You posted the following query as an example of a unique_ids_query in your comment above:

SELECT (SELECT COUNT(DISTINCT id,subid) FROM tb_stats WHERE subid = '1' AND date <= '2011-10-20') - (SELECT COUNT(DISTINCT id,subid) FROM tb_stats WHERE subid = '1' AND date <= '2011-10-19') AS unique_ids;

Essentially the query is getting the change in distinct id, subid combinations between 2011-10-20 and 2011-10-19 where the subid is '1'. You're doing this by first counting ALL of the records below the 2011-10-20 date and then counting ALL of the records below the 2011-10-19 date. You also have THREE SELECT statements in that query.

Unless I'm mistaken thats the same as counting all of the records between 2011-10-19 and 2011-10-20, which you could do with the following:

SELECT COUNT(DISTINCT id,subid) AS unique_ids FROM tb_stats WHERE subid = '1' AND date <= '2011-10-20' AND date >= '2011-10-19';

You should also start using MySQLi or PDO in PHP if possible for store procedures, which could also be a performance booster.

In addition you should run as many queries as possible over a single connection to reduce connection latency (it adds up!)

One final potential boon would be to write MySQL functions. It would be possible to run the query above without using COUNT or DISTINCT with a MySQL function which would be a performance booster beyond the boost given by running it as a function.

William King
  • 1,180
  • 1
  • 8
  • 18
  • The query can not ba optimized that way, because the result will be different. It'll just show distinct values between 2011-10-19 and 2011-10-20. But what i need is to count the amount values which are new (i mean id+subid which were not present in the table before). – mintobit Nov 23 '11 at 19:11
  • The id+subid which were not present in the table before when? Does the 'date' column not hold the time the item was inserted into the table? – William King Nov 23 '11 at 19:36
  • Or do you mean the amount of unique id+subid combinations that were NEVER in the table before at any time before a specific date? – William King Nov 23 '11 at 19:55
  • the second is correct. i've forgotten to mention time in the query example. SELECT (SELECT COUNT(DISTINCT id,subid) FROM tb_stats WHERE subid = '1' AND (date < '2011-10-20' OR date = '2011-10-20' AND `time` <= '02:00')) - (SELECT COUNT(DISTINCT id,subid) FROM tb_stats WHERE subid = '1' AND (date < '2011-10-20' OR date = '2011-10-20' AND `time` <= '01:00')) AS unique_ids; – mintobit Nov 23 '11 at 20:13
  • Index is like: CREATE INDEX `search` ON `tb_stats` (`subid`,`date`,`time`); – mintobit Nov 23 '11 at 23:16