0

I have a website that is mirrored cross two sub-domains. So I have separate analytics datasets for both. I have the following tables:

|------------------------------|
| table_a                      |
|------------------------------|
| url             | mod_date   |
|------------------------------|
| /foo/index.html | 2009-10-24 |
| /bar/index.php  | 2010-01-04 |
| /foo/bar.html   | 2009-01-04 |
|------------------------------|

|-----------------------------------------|
| table_b                                 |
|-----------------------------------------|
| url             | views | access_date   |
|-----------------------------------------|
| /foo/index.html | 35000 | 2009-12-01    |
| /foo/index.html | 20000 | 2010-02-01    |
| /bar/index.php  | 35000 | 2010-01-01    |
| /bar/index.php  | 15000 | 2011-01-01    |
|-----------------------------------------|

|-----------------------------------------|
| table_c                                 |
|-----------------------------------------|
| url             | views | access_date   |
|-----------------------------------------|
| /foo/index.html | 35000 | 2009-10-01    |
| /foo/bar.html   | 10000 | 2011-05-01    |
| /bar/index.php  | 35000 | 2011-08-01    |
| /bar/index.php  | 15000 | 2012-04-01    |
|-----------------------------------------|

I have the following query:

SELECT 
    a.url
    ,DATE_FORMAT(a.mod_date, '%d/%m/%Y') AS 'mod_date'
    ,DATE_FORMAT(MIN(b.access_date), '%d/%m/%Y') AS 'first_date'
    ,DATE_FORMAT(MAX(b.access_date), '%d/%m/%Y') AS 'last_date'
    ,SUM(ifnull(b.pages,0)) + SUM(ifnull(c.pages,0)) AS 'page_views'    
    ,DATEDIFF(MAX(b.access_date),MIN(b.access_date)) AS 'days'
    ,ROUND(SUM(b.pages) / (DATEDIFF(MAX(b.access_date),MIN(b.access_date)) / 30.44)) AS 'b_mean_monthly_hits'
    ,ROUND(SUM(c.pages) / (DATEDIFF(MAX(c.access_date),MIN(c.access_date)) / 30.44)) AS 'a_mean_monthly_hits'
FROM
    tabl_a a
        LEFT JOIN
    table_b b ON b.url = a.url
        LEFT JOIN
    table_c c ON c.url = a.url
GROUP BY a.url
HAVING ROUND(SUM(b.pages) / (DATEDIFF(MAX(b.access_date),MIN(b.access_date)) / 30.44)) < 5
AND ROUND(SUM(c.pages) / (DATEDIFF(MAX(c.access_date),MIN(c.access_date)) / 30.44)) < 5
;

The result I'm looking for is:

|------------------------------------------------------------------------------------------|
| results                                                                                  |
|------------------------------------------------------------------------------------------|
| url             | mod_date   | first_date | last_date  | page_views   | avg_monthly_hits |
|------------------------------------------------------------------------------------------|
| /foo/index.html | 2009-10-24 | 2009-10-01 | 2010-02-01 | 90000        | 22273            |
| /bar/index.php  | 2010-01-04 | 2010-01-01 | 2012-04-01 | 85000        | 3275             |
| /foo/bar.html   | 2009-01-04 | 2011-05-01 | 2011-06-01 | 10000        | 9819             |
|------------------------------------------------------------------------------------------|

Where 'avg_monthly_hits' is the sum of b.views and c.views (as 'page_views') divided by the number of days (don't know how to get the months) between the oldest and newest access_date from table_b or table_c divided by 30.44 (the average number of days in a month).

I hope that I have explained myself fully. :)

StefWill
  • 353
  • 1
  • 3
  • 13

3 Answers3

0

Try this Query. It would be nice to have some date to test it

select
  a.*,
  b.MinDate as `FirstDate`,
  b.MaxDate as `LastDate`,
  (ifnull(b.PSum,0) + ifnull(c.QSum,0)) as `TotalViews`,
  datediff(b.MaxDate,b.MinDate) as `Diff`,
  (((ifnull(b.PSum,0) + ifnull(c.QSum,0))/datediff(b.MaxDate,b.MinDate))/30.44) as `BMonthlyHits`,
  (((ifnull(b.PSum,0) + ifnull(c.QSum,0))/datediff(b.MaxDate,b.MinDate))/30.44) as `CMonthlyHits`
from table_a as a
left join (select url , min(access_date) as MinDate,max(access_date)as MaxDate,sum(pages) as PSum from table_b group by url) as b on a.url = b.url
left join (select url , min(access_date)as MinDate,max(access_date)as MaxDate, sum(pages) as QSum from table_c group by url) as c on a.url = c.url
group by a.url
HAVING BMonthlyHits < 5 and CMonthlyHits < 5
Muhammad Raheel
  • 19,823
  • 7
  • 67
  • 103
0

if table_b and table_c have the same structure, just union them

SELECT
 a.url,
 DATE_FORMAT(a.mod_date, '%d/%m/%Y') AS 'mod_date',
 DATE_FORMAT(MIN(u.access_date), '%d/%m/%Y') AS 'first_date',
 DATE_FORMAT(MAX(u.access_date), '%d/%m/%Y') AS 'last_date',
 SUM(u.views) AS 'page_views',
 DATEDIFF(MAX(u.access_date), MIN(u.access_date)) AS 'days',
 ROUND(SUM(u.views) / (DATEDIFF(MAX(u.access_date),MIN(u.access_date)) / 30.44)) AS 'avg_monthly_hits'
FROM table_a AS a 
LEFT JOIN (
   (SELECT * FROM table_b) 
   UNION 
   (SELECT * FROM table_c)
) AS u USING (url)
GROUP BY a.url
HAVING avg_monthly_hits < 5
Puggan Se
  • 5,738
  • 2
  • 22
  • 48
0

In the end a nested query solved the problem.

SELECT DISTINCT a.url
, q.mod_date
, IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date) AS 'min_date'
, IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date) AS 'max_date'
, (PERIOD_DIFF(DATE_FORMAT(IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date), '%Y%m'),DATE_FORMAT(IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date), '%Y%m')) + 1) AS 'months'
, q.page_views
, ROUND(q.page_views / ((PERIOD_DIFF(DATE_FORMAT(IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date), '%Y%m'),DATE_FORMAT(IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date), '%Y%m'))) + 1)) AS 'avg_monthly_hits'
FROM table_a a
INNER JOIN
    (SELECT 
            a.url,
                a.date AS 'mod_date',
                MIN(b.date) AS 'b_min_date',
                MAX(b.date) AS 'b_max_date',
                MIN(c.date) AS 'c_min_date',
                MAX(c.date) AS 'c_max_date',
                SUM(ifnull(b.pages, 0)) + SUM(ifnull(c.pages, 0)) AS 'page_views'
        FROM
            table_a a
                LEFT JOIN
            table_b b ON a.url = b.url
                LEFT JOIN
            table_c c ON a.url = c.url
        GROUP BY a.url
) q
ON a.url = q.url
WHERE ROUND(q.page_views / ((PERIOD_DIFF(DATE_FORMAT(IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date), '%Y%m'),DATE_FORMAT(IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date), '%Y%m'))) + 1)) < 5
;
StefWill
  • 353
  • 1
  • 3
  • 13