-1

Scenario to Solve: Our ultimate aim is to get Scaled AverageWeeklySales(AWS) with below formulae:

Scaled AWS = (Number of Weeks article has had a record within the store / maximum Number of Weeks in data) * (AWS /  Number of Weeks)

Input 'table_a' Structure:

Store   Article     Date    Week    Year    Sales
xx      xx          xx      xx      xx      xx

Tried Out: Have framed a query till finding AWS[at store level] and got stuck there to proceed furthur. Since I'm new to this, I'm not sure how this Scaled AWS can be achieved through BigQuery SQL. Need Help!

WITH total_weekly_sales AS (
SELECT
  Store,
  Week,
  SUM(Sales) AS TotalWeeklySales
FROM
  table_a
GROUP BY
  Store,
  Week)
SELECT
  Store,
  AVG(TotalWeeklySales) AS AverageWeeklySales
FROM
  total_weekly_sales tws
GROUP BY
  Store

Sample Data:

Store   Article     Date            Week        Year    Sales
11      aa          2019-07-01      202001      2020    4.9
11      bb          2019-07-07      202001      2020    22.5
11      cc          2019-07-08      202002      2020    10.4
12      aa          2019-07-01      202001      2020    5.3
12      bb          2019-07-07      202001      2020    20.2

Comment Queries:

  • AWS - means Average Weekly Sales[to be derived from the given data at store level, as I have mentioned in my 'Tried Out' session in Question, as to find totalweeklysale at store level and getting average of it gives AWS]
  • Number of Weeks - at store level[for each store, how many weeks data present. Eg. For store 11, this variable value will be 2 whereas for Store 12, this variable value will be 1]
  • Maximum Number of Weeks in data - For ex: from our sample data, the value for this would be 2 [Explaination: for each store, have to calculate total number of weeks data present and take the maximum number of weeks across all stores. Here, Store 11 have 2 weeks in total whereas Store 12 has 1 week in total. So, the maximum here would be 2 which is assigned as value for this variable]

Desired Result: Scaled AWS Formula Part_1 Result [Number of Weeks article has had a record within the store / maximum Number of Weeks in data]

Store   part_1_value    
11      1.5 (3/2)
12      1 (2/2)

Scaled AWS Formula Part_2 Result [AWS / Number of Weeks]

Store   part_2_value    
11      18.9 (37.8/2)
12      25.5 (25.5/1)

Final Output to be: [Part_1 * Part_2]

Store   ScaledAverageWeeklySales    
11      28.35 (1.5*18.9)
12      25.5 (1*25.5)

On the whole, our end result should be at Store Level. Thanks in Advance!

user12345
  • 499
  • 1
  • 5
  • 21
  • Sample data and desired results would help explain the formula. Just so you know, it is unclear what "Number of weeks" means or for that matter how "maximum Number of Weeks in data" is calculated. It is also unclear what level the result is at . . . article, store, something else? And nothing is called `AWS` in the data. – Gordon Linoff Jul 29 '20 at 11:59
  • @GordonLinoff Have edited Question by adding details as per your requirement. – user12345 Jul 30 '20 at 04:28
  • I am trying to understand your logic before coding a query to help you. Can you explain why the *number of weeks article has had a record within the store* is 3 for store 11? – Alexandre Moraes Jul 30 '20 at 07:49
  • 1
    @AlexandreMoraes Thank You, looking forward for your Solution! To answer your question, this means number of article present for that particular store. Eg. For Store 11, total number of unique articles present is 3 whereas, for Store 12, the value is 2. – user12345 Aug 02 '20 at 14:19

1 Answers1

1

After your clarification, I was able to create a query which calculates what you are seeking for.

I have used the data your provided together with the WITH statement, together with MAX, COUNT aggregation builtin methods in BigQuery.Below is the query,

WITH unique_articles AS (
SELECT Store, COUNT(DISTINCT Article) as uniq_art FROM `test-proj-261014.bq_load_codelab.sales_week`
GROUP BY Store
),
max_weeks_st AS (
#not possible to use DATE_DIFF cos the format of the date, the number of weeks would be 1 for the sample data
#so WEEK column will be used instead
SELECT Store, COUNT(Distinct Week) as max_weeks_st  FROM `test-proj-261014.bq_load_codelab.sales_week` 
group by Store
), 
#below returns a struct. So in order to access the int64 value name_of_struct.max_weeks_data
max_weeks_data AS(
SELECT MAX(a.max_weeks_st) AS max_weeks_data FROM max_weeks_st a
),
sum_sales AS (
SELECT Store, SUM(sales) as sum_sales  FROM `test-proj-261014.bq_load_codelab.sales_week` 
GROUP BY Store
), 
final_data AS(
select a.Store, a.uniq_art, b.max_weeks_st,c.sum_sales, max_weeks_data  FROM  unique_articles a LEFT JOIN sum_sales c USING(Store)
LEFT JOIN max_weeks_st b USING(Store) CROSS JOIN  max_weeks_data 
)
SELECT Store,((uniq_art/max_weeks_data.max_weeks_data)*(sum_sales/ max_weeks_st)) as AWS_result  FROM final_data a

and the output,

Row Store   AWS_result
1   11      28.35
2   12      25.5

Notice that max_weeks_data is a struct. Thus, to access its value it is necessary to follow the syntax: name_of_struct.max_weeks_data. Moreover, notice that, each variable you described is calculate in a temp table, always with the Store ID, which is used to join the tables.

Alexandre Moraes
  • 3,892
  • 1
  • 6
  • 13