0

This is for finding top 10 advertiser apps'(based on # of installs) avg. eCPM (calculation below in query). It works accurately if I run it with one publisher ID only.

Issue: when using more than one publishing app ID, the subquery (C2) reads it as if the top 10 advertiser apps are from multiple publishing IDs, thus misleading calculations. How can I run this in a way that it can read per publisher app (& GET ECPM OF TOP 10 ADVERTISERS PER PUBLISHER APP)?

 SELECT

    d.app_name AS publisher_app_name,           

    a.publisher_app AS publisher_app_id,            

    (   SELECT 
            CASE WHEN SUM(b.impressions) > 0
                THEN ROUND((1000*SUM(b.money_spent))/SUM(b.impressions),3)
                    ELSE 0
                        END AS top_10_adv_ecpm
        FROM warehouse.daily_uber_aggr b
        WHERE publisher_app IN ('983234fb17ba470008','2b3ed6ff4d0e48463')
        AND b.dt BETWEEN '2018-12-26' AND '2019-01-25'
        AND b.advertiser_app IN (   SELECT advertiser_app
                                  FROM warehouse.daily_uber_aggr
                                  WHERE publisher_app IN ('983234fb17ba470008','2b3ed6ff4d0e48463')
                                  AND dt BETWEEN '2018-12-26' AND '2019-01-25'
                                  GROUP BY advertiser_app
                                  ORDER BY SUM (converted_installs) DESC
                                  LIMIT 10
    ) AS top_10_adv_ecpm,

    SUM(a.converted_installs) AS installs_publisher

 FROM warehouse.daily_uber_aggr a

LEFT JOIN dimensions.apps d ON d.app_id = a.publisher_app

WHERE a.publisher_app IN ('983234fb17ba470008','2b3ed6ff4d0e48463')

AND a.dt BETWEEN '2018-12-26' AND '2019-01-25'

 GROUP BY 
    d.app_name,
    a.publisher_app
  • 1
    You should illustrate the question with the schema, i.e. the `CREATE TABLE` statements for the involved tables, sample data a `INSERT INTO` statements and the expected result for the sample data. – sticky bit Jan 28 '19 at 23:23
  • 1
    Seems like a CROSS APPLY would be appropriate here. Perhaps a simplified data sample and desired results would help with the visualization. – John Cappelletti Jan 28 '19 at 23:23
  • 1
    LIMIT is not valid SQL Server syntax, seems to be MySQL... – dnoeth Jan 28 '19 at 23:25
  • I'd think you'd need your subquery to be a correlated sub-query so as not to mash up all the results together. – pmbAustin Jan 28 '19 at 23:26

1 Answers1

0

I'm fairly sure you're making this more complex than it needs to be. Putting your aggregation into a sub select in your resultset is really messing up what you're trying to do. Wouldn't this (untested with data, confirmed syntax) work just as well?

SELECT top 10 d.app_name AS publisher_app_name, a.publisher_app AS publisher_app_id, 
    CASE WHEN SUM(a.impressions) > 0
        THEN ROUND((1000*SUM(a.money_spent))/SUM(a.impressions),3)
        ELSE 0
        END AS top_10_adv_ecpm,
    SUM(a.converted_installs) AS installs_publisher
FROM warehouse.daily_uber_aggr a 
LEFT JOIN dimensions.apps d ON d.app_id = a.publisher_app
WHERE a.publisher_app IN ('983234fb17ba470008','2b3ed6ff4d0e48463')
AND a.dt BETWEEN '2018-12-26' AND '2019-01-25'
GROUP BY d.app_name, a.publisher_app
ORDER BY SUM (converted_installs) DESC

Note I changed LIMIT 10 to top 10 because this was tagged as SQL Server, if it's another database brand such as MySQL then use the appropriate means to limit to your top 10 results.

FWIW the issue with your original query is you have no link between your query and the sub-query in your result set, so for every row your main query returns you'd run the sub-query and return its full results.

Ben
  • 1,316
  • 1
  • 7
  • 11