0

Hoping someone might be able to assist me with this.

Assume I have the table listed below. Hosts can show up multiple times on the same date, usually with different backupsizes.

+------------------+--------------+
| Field            | Type         | 
+------------------+--------------+
| startdate        | date         |
| host             | varchar(255) | 
| backupsize       | float(6,2)   |  
+------------------+--------------+

How could I find the sum total of backupsize for 7 day increments starting with the earliest date, through the last date? I don't mind if the last few days get cut off because they don't fall into a 7 day increment.

Desired output (prefered):

+------------+----------+----------+----------+-----
|Week of     | system01 | system02 | system03 | ...
+------------+----------+----------+----------+-----
| 2014/07/30 | 2343.23  | 232.34   | 989.34   |
+------------+----------+----------+----------+-----
| 2014/08/06 | 2334.7   | 874.13   | 234.90   |
+------------+----------+----------+----------+-----
| ...        | ...      | ...      | ...      |

OR

+------------+------------+------------+------
|Host        | 2014/07/30 | 2014/08/06 | ...
+------------+------------+------------+------
| system01   | 2343.23    | 2334.7     | ...  
+------------+------------+------------+-------
| system02   | 232.34     | 874.13     | ...
+------------+------------+------------+-------
| system03   | 989.34     | 234.90     | ...
+------------+------------+------------+-------
| ...        | ...        | ...        |       

Date format is not a concern, just as long as it gets identified somehow. Also, the order of the hosts is not a concern either. Thanks!

elmera
  • 21
  • 1
  • 1
    There's really two issues wrapped into one question. The first, to get records "grouped" into 7 day intervals, is addressed by adding a `GROUP BY` clause with an expression that returns a single value for all dates within a given 7 day interval. That's very straightforward, just a matter of figuring out an expression to do that for you. The second issue is to convert the data rows into a crosstab, and that's not as straightforward. The general solution to get that result in SQL is non-trivial, much is usually more efficiently handled in client presentation layer. – spencer7593 Aug 18 '14 at 03:28
  • Thanks -- understood. I'm not a DBA and only have a basic understanding of SQL query structures, though I have a broad shell scripting background which helps a little. That said, I have been able to put together a stored procedure that gets the data I want and pivots it like the 1st example in my original question. The problem is, the row's are for each distinct date in my table, and I just can't seem to figure out how to group them so each row is 7-days of data instead. – elmera Aug 18 '14 at 18:19

3 Answers3

3

The simplest way is to get the earliest date and just count the number of days:

select x.minsd + interval floor(datediff(x.minsd, lb.startdate) / 7) day as `Week of`,
       host,
       sum(backupsize)
from listedbelow lb cross join
     (select min(startdate) as minsd from listedbelow lb) x
group by floor(datediff(x.minsd, lb.startdate) / 7)
order by 1;

This produces a form with week of and host on each row. You can pivot the results as you see fit.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • In MySQL, the expression `lb.startdate - x.minsd` does not (in general) return a difference in days between two `DATE` values. (There are some edge cases where it will appear to work, but it's actually converting a `DATE` into a numeric value value, and doing a numeric subtract, e.g. 20140817 - 20140727 evaluates to 90, not the number of days between the dats. To get the number of days between two dates, we need to use a MySQL function, for example `DATEDIFF`. – spencer7593 Aug 18 '14 at 03:43
  • +------------+--------------+---------------+ | Week of | host | sum(capacity) | +------------+--------------+---------------+ | 2014-06-06 | sunrobo | 6830.07 | | 2014-06-07 | falcon | 96929.82 | | 2014-06-08 | ep1p-apnsr01 | 100036.32 | | 2014-06-09 | ep1p-apnsr01 | 95971.44 | | 2014-06-10 | ep1p-apnsr01 | 111107.31 | | 2014-06-11 | ep1p-wb01 | 14851.89 | +------------+--------------+---------------+ – elmera Aug 18 '14 at 18:24
  • @Gordon Linoff -- Thanks. I tried running your query but it didn't appear to group by 7-days. If I can figure out how to post my output in a comment I will do so. – elmera Aug 18 '14 at 18:32
  • 1
    @elmera . . . The best thing is to put sample data on SQL Fiddle (www.sqlfiddle.com). – Gordon Linoff Aug 18 '14 at 18:46
0

I'll assume that what you want is the sum of bakcupsize grouped by host and that seven-day interval you are talking about.

My solution would be something like this:

  1. You need to define the first date, and then "create" a column with the date you want (the end of the seven-day period)
  2. Then I would group it.

I think temporary tables and little tricks with temp variables are the best way to tackle this, so:

drop table if exists temp_data;
create temporary table temp_data
select a.*
     -- The @d variable will have the date that you'll use later to group the data.
     , @d := case
          -- If the current "host" value is the same as the previous one, then...
           when @host_prev = host then 
               -- ... if @d is not null and is within the seven-day period,
               -- then leave the value of @d intact; in other case, add 7 days to it.
               case
                   when @d is not null or a.startdate <= @d then @d
                   -- The coalesce() function will return the first not null argument
                   -- (just as a precaution)
                   else dateadd(coalesce(@d, a.startdate), interval +7 day)
               end
           -- If the current "host" value is not  the same as the previous one, 
           -- then take the current date (the first date of the "new" host) and add
           -- seven days to it.
           else @d = dateadd(a.startdate, interval +7 day) 
       end as date_group
     -- This is needed to perform the comparisson in the "case" piece above
     , @host_prev := a.host as host2
from
       (select @host_prev = '', @d = null) as init -- Initialize the variables
     , yourtable as a
-- IMPORTANT: This will only work if you order the data properly
order by a.host, a.startdate;
-- Add indexes to the temp table, to make things faster
alter table temp_data
   add index h(host),
   add index dg(date_group)
   -- OPTIONAL: You can drop the "host2" column (it is no longer needed)
   -- , drop column host2
   ;

Now, you can get the grouped data:

select a.host, a.date_group, sum(a.bakcupsize) as backupsize
from temp_data as a
group by a.host, a.date_group;

This will give you the unpivoted data. If you want to build a pivot table with it, I recommend you take a look to this article, and/or read this question and its answers. In short, you'll have to build a "dynamic" sql instruction, prepare a statement with it and execute it.


Of course, if you want to group this by week, there's a simpler approach:

drop table if exists temp_data2;
create temporary table temp_data2
select a.*
     -- The following will give you the end-of-week date
     , dateadd(a.startdate, interval +(6 - weekday(a.startdate)) day) as group_date
from yourtable as a;
alter table temp_data
   add index h(host),
   add index dg(date_group);
select a.host, a.date_group, sum(a.bakcupsize) as backupsize
from temp_data as a
group by a.host, a.date_group;

I leave the pivot part to you.

Community
  • 1
  • 1
Barranka
  • 20,547
  • 13
  • 65
  • 83
  • Thanks Barranka. This looks pretty elaborate, so I'll have to try it out later this evening and report back. – elmera Aug 18 '14 at 18:32
0

So I was able to determine a solution that fit my needs using a procedure I created by putting together concepts from your recommended solutions as well as some other other solutions I found on this site. The procedure SUM's by 7 day increments as well as does a pivot.

DELIMITER $$
CREATE PROCEDURE `weekly_capacity_by_host`()
BEGIN
SELECT MIN(startdate) into @start_date FROM testtable;

SET @SQL = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'SUM(if(host=''',host,''', backupsize, 0)) as ''',host,''''
    )
  ) INTO @SQL
FROM testtable;

SET @SQL = CONCAT('SELECT 1 + DATEDIFF(startdate, ''',@start_date,''') DIV 7 AS week_num
  , ''',@start_date,''' + INTERVAL (DATEDIFF(startdate, ''',@start_date,''') DIV 7) WEEK AS week_start,
  ', @SQL,' 
  FROM testtable group by week_num'
);

PREPARE stmt FROM @SQL;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Output appears as follows:

mysql> call weekly_capacity_by_host;
+----------+------------+----------+----------+----------+----------+
| week_num | week_start | server01 | server02 | server03 | server04 |
+----------+------------+----------+----------+----------+----------+
|        1 | 2014-06-11 |  1231.08 |    37.30 |    12.04 |    68.17 |
|        2 | 2014-06-18 |  1230.98 |    37.30 |    11.76 |    68.13 |
|        3 | 2014-06-25 |  1243.12 |    37.30 |     8.85 |    68.59 |
|        4 | 2014-07-02 |  1234.73 |    37.30 |    11.77 |    67.80 |
|        5 | 2014-07-09 |   341.32 |     0.04 |     0.14 |     4.94 |
+----------+------------+----------+----------+----------+----------+
5 rows in set (0.03 sec)
elmera
  • 21
  • 1