1

I'm writing out a query that takes ad marketing data from Google Ads, Microsoft, and Taboola and merges it into one table.

The table should have 3 rows, one for each ad company with 4 columns: traffic source (ad company), money spent, sales, and cost per conversion. Right now I'm just dealing with the first 2 till I get those right. The whole table's data should be grouped within that a given month's data.

Right now the results I'm getting are multiple rows from each traffic source, some of them merging months of data into the cost column instead of summing up the costs within a given month.

WITH google_ads AS
  ( SELECT 'Google' AS traffic_source,
           date_trunc('month', "day"::date) AS month,
           SUM(cost / 1000000) AS cost
   FROM googleads_campaign AS g
   GROUP BY month
   ORDER BY month DESC),

     taboola AS
  ( SELECT 'Taboola' AS traffic_source,
           date_trunc('month', "date"::date) AS month,
           SUM(spent) AS cost
   FROM taboola_campaign AS t
   GROUP BY month
   ORDER BY month DESC),

     microsoft AS
  ( SELECT 'Microsoft' AS traffic_source,
           date_trunc('month', "TimePeriod"::date) AS month,
           SUM("Spend") AS cost
   FROM microsoft_campaign AS m
   GROUP BY month
   ORDER BY month DESC)

SELECT (CASE
            WHEN M.traffic_source='Microsoft' THEN M.traffic_source
            WHEN T.traffic_source='Taboola' THEN T.traffic_source
            WHEN G.traffic_source='Google' THEN G.traffic_source
        END) AS traffic_source1,
        SUM(CASE
            WHEN G.traffic_source='Google' THEN G.cost
            WHEN T.traffic_source='Taboola' THEN T.cost
            WHEN M.traffic_source='Microsoft' THEN M.cost
        END) AS cost,
        (CASE
            WHEN G.traffic_source='Google' THEN G.month
            WHEN T.traffic_source='Taboola' THEN T.month
            WHEN M.traffic_source='Microsoft' THEN M.month
        END) AS month1
FROM google_ads G
LEFT JOIN taboola T ON G.month = T.month
LEFT JOIN microsoft M ON G.month = M.month
GROUP BY traffic_source1, month1

Here's an example of the results I'm getting. The month column is simply for testing purposes.

| traffic_source1 |    cost    |      month1    |
|:----------------|:-----------|:---------------|
| Google          | 210.00     | 01/09/18 00:00 |
| Google          | 1,213.00   | 01/10/18 00:00 |
| Google          | 2,481.00   | 01/11/18 00:00 |
| Google          | 3,503.00   | 01/12/18 00:00 |
| Google          | 7,492.00   | 01/01/19 00:00 |
| Microsoft       | 22,059.00  | 01/02/19 00:00 |
| Microsoft       | 16,958.00  | 01/03/19 00:00 |
| Microsoft       | 7,582.00   | 01/04/19 00:00 |
| Microsoft       | 76,125.00  | 01/05/19 00:00 |
| Taboola         | 37,205.00  | 01/06/19 00:00 |
| Google          | 45,910.00  | 01/07/19 00:00 |
| Google          | 137,421.00 | 01/08/19 00:00 |
| Google          | 29,501.00  | 01/09/19 00:00 |

Instead, it should look like this (Let's say for the month of July this year, for instance):

| traffic_source |    cost   |
|----------------|-----------|
| Google         | 53,901.00 |
| Microsoft      | 22,059.00 |
| Taboola        | 37,205.00 |

Any help would be greatly appreciated, thanks!

awl19
  • 366
  • 4
  • 10
  • 1
    Your query would be simpler if instead of creating 3 CTEs you used `union` to turn them into one. Then the joins and cases are unnecessary. [dbfiddle](https://dbfiddle.uk/?rdbms=postgres_11&fiddle=c3273ec2ca4b10ba785c4935ba8a49f8) You can also use a view for a more permanent solution. – Schwern Sep 08 '19 at 06:54
  • @Schwern Much appreciated! I tried it, but it ended up returning results very similar to my original results except this time there was duplicates in the month column, but they weren't duplicates between traffic sources, so it's not necessarily a bad thing, but still not quite there. – awl19 Sep 08 '19 at 07:03
  • Can you provide some example data that produces duplicates? If they weren't between the traffic sources, what we're they dups of? Maybe fork the dbfiddle to show us? – Schwern Sep 08 '19 at 19:19

2 Answers2

1

You can try this way:

WITH google_ads AS
  ( SELECT 'Google' AS traffic_source,
           date_trunc('month', "day"::date) AS month,
           SUM(cost / 1000000) AS cost
   FROM googleads_campaign AS g
   GROUP BY month
   ORDER BY month DESC),

     taboola AS
  ( SELECT 'Taboola' AS traffic_source,
           date_trunc('month', "date"::date) AS month,
           SUM(spent) AS cost
   FROM taboola_campaign AS t
   GROUP BY month
   ORDER BY month DESC),

     microsoft AS
  ( SELECT 'Microsoft' AS traffic_source,
           date_trunc('month', "TimePeriod"::date) AS month,
           SUM("Spend") AS cost
   FROM microsoft_campaign AS m
   GROUP BY month
   ORDER BY month DESC)

SELECT (CASE
            WHEN M.traffic_source='Microsoft' THEN M.traffic_source
            WHEN T.traffic_source='Taboola' THEN T.traffic_source
            WHEN G.traffic_source='Google' THEN G.traffic_source
        END) AS traffic_source1,
        SUM(CASE
            WHEN G.traffic_source='Google' THEN G.cost
            WHEN T.traffic_source='Taboola' THEN T.cost
            WHEN M.traffic_source='Microsoft' THEN M.cost
        END) AS cost,
        (CASE
            WHEN G.traffic_source='Google' THEN G.month
            WHEN T.traffic_source='Taboola' THEN T.month
            WHEN M.traffic_source='Microsoft' THEN M.month
        END) AS month1      
FROM google_ads G
LEFT JOIN taboola T ON G.month = T.month
LEFT JOIN microsoft M ON G.month = M.month
GROUP BY traffic_source1, month1
HAVING EXTRACT(month from month1) = ... desired month (July is 7)
  • Hey, thanks for your quick reply! I just tried it and it did reduce the results down to the 3 "traffic sources" as desired, but it summed all the data for each source rather than just the data within a given month. Any more tips? – awl19 Sep 08 '19 at 06:28
  • @awl19 `where T.month = ... and M.month = ... and G.month = ...` – Schwern Sep 08 '19 at 06:39
  • @Schwern Hmm, well I did try that earlier (before posting this question) and just now with both ```WHERE G.month = '2019-06-01' AND T.month = '2019-06-01' AND M.month = '2019-06-01'``` and ```WHERE G.month = '01/06/19' AND T.month = '01/06/19' AND M.month = '01/06/19'``` (Not sure which format is proper, but I expect the first one) and received no results in each test case. – awl19 Sep 08 '19 at 06:49
  • 1
    I've added a HAVING filtering clause for the desired month – Peter Kalef ' DidiSoft Sep 08 '19 at 06:52
  • @PeterKalef'DidiSoft I re-tried and got an error saying month1 column does not exist, which I've gotten before. Makes for a real head scratcher since obviously it's defined just a couple lines above your HAVING filtering and used in the GROUP BY. Any suggestions? – awl19 Sep 08 '19 at 07:01
0

The concept of a different table for each ad source is really a very bad idea. It vastly compounds the complexity of of queries requiring consolidation. It would be much better to have a single table having the source along with the other columns. Consider what happens when marketing decides to use 30-40 or more ad suppliers.
If you cannot create a single table then at least standardize column names and types. Also build a view, a materialized view, or a table function (below) which combines all the traffic sources into a single source.

create or  replace function consolidated_ad_traffic()
 returns table( traffic_source   text
              , ad_month         timestamp with time zone   
              , ad_cost          numeric(11,2)
              , ad_sales         numeric(11,2)
              , conversion_cost  numeric(11,6)
              )
 language sql 
 AS $$
 with ad_sources as
    ( select 'Google' as traffic_source
           , "date"   as ad_date
           , round(cast (cost AS numeric )  / 1000000.0,2)  as cost
           , sales
           , cost_per_conversion   
        from googleads_campaign 
     union all
     select 'Taboola'  
          , "date" 
          , spent  
          , sales
          , cost_per_conversion             
       from taboola_campaign  
     union all 
     select 'Microsoft' 
          , "TimePeriod"
          , "Spend"  
          , sales
          , cost_per_conversion 
      from microsoft_campaign 
    )
select * from ad_sources;
$$;

With a consolidated view of the data you can now write normal selects as though you had a single table. Such as:

select * from consolidated_ad_traffic();

select distinct on( traffic_source, to_char(ad_month, 'mm'))
       traffic_source
     , to_char(ad_month, 'Mon') "For Month" 
     , to_char(sum(ad_cost) over(partition by traffic_source,  to_char(ad_month, 'Mon')), 'FM99,999,999,990.00') monthly_traffic_cost
     , to_char(sum(ad_cost) over(partition by traffic_source), 'FM99,999,999,990.00')   total_traffic_cost 
  from consolidated_ad_traffic();

select traffic_source, sum(ad_cost) ad_cost
  from consolidated_ad_traffic()
 group by traffic_source
 order by traffic_source;

 select traffic_source
      , to_char(ad_month, 'dd-Mon') "For Month"
      , sum(ad_cost) "Monthly Cost"
   from consolidated_ad_traffic()
  where date_trunc('month',ad_month) = date_trunc('month', date '2019-07-01')
    and traffic_source = 'Google'
  group by traffic_source, to_char(ad_month, 'dd-Mon') ;

Now this won't do much for updating but will drastically ease selection.

Belayer
  • 13,578
  • 2
  • 11
  • 22