1

I am currently picking up SQL and have come across a problem.

This question pertains to many use cases for me. I have locales in en-uk, en-au, es-latam, es-spain that I want to combine simply as EN or ES for reporting over time.

Below are examples on asking how to count number of tomato page views, and number of carrot page views.

I am hoping once I figure out how to do this, I can apply this to language locales.

The intended output can be seen in this spreadsheet here in column H:L: https://docs.google.com/spreadsheets/d/1CNE__ikiHEQHedH0UiSPmRI1s47e7qEH_aJJVtYSSzU/edit?usp=sharing

Unfamiliar with CASE as I am beginning to start out on this journey, but I need to aggregate/summarize the data already lying in the table even more so that I may build a chart.

Can anyone point out any optimization areas? In addition, a side request: AND url.website like '%tomato%' or '%carrot%' (How do I make this an OR?)

Lastly, can anyone help me figure out how to use a NOT CONTAIN in AND url.website does not contain any of these words case insensitive (?i) potato,Mushroom,Celery

I am coming from a Spreadsheets background and am an advanced user, but seems as if I am having difficulty transferring this knowledge over to SQL.

Many thanks! And please let me know if you have any questions,

After many forum scourings, I understand I will need to insert a Subquery in here in order to get the intended table. I aim to plot the count of these website page visits in a graph over time.

[Most Recent EDIT]

Shows URL with Count number on the right hand side

SELECT
url.website
report.timestamp
count(url.website) as count
FROM
datatable.report
WHERE url.website like '%carrot%' OR url.website like '%tomato%'
Group by url.website

Error code: report.timestamp is not found in Group By - but if I add it, I get microseconds in one column, and the count of appearances of these microseconds.

The main gist is to add the report.timestamp in the select, so that I could plot by aggregated month, but once I do this, the count values are not summed.

[Past Edit 2]

  SELECT
  url.website
  COUNT(url.website) as Count
  (CASE WHEN report.web.url like '%carrot%' then 'carrot website'
  WHEN report.web.url like '%tomato%' then 'tomato website'
  ELSE 'other website'
  END)

  FROM datatable.report
  WHERE (product.tag = 12345)
  AND url.website NOT IN ('Potato','Mushroom','Celery')
  AND url.website like '%tomato%'
  GROUP BY url.website

[Past Edit 3]

 SELECT
 (CASE WHEN url.website like '%carrot%' THEN 'carrot'
 WHEN url.website like '%tomato%' THEN 'tomato'
 ELSE 'other'
 END)
 url.website
 COUNT(carrot) as carrotwebsite
 COUNT(fundamental) As tomatowebsite

*thinking that maybe I needed to case/group them first, then show the 
 count 
 displays.

Please see column H:L in the public spreadsheet: https://docs.google.com/spreadsheets/d/1CNE__ikiHEQHedH0UiSPmRI1s47e7qEH_aJJVtYSSzU/edit?usp=sharing

  • i got a permission denied error when trying to access the google doc – Rodney Ellis May 30 '19 at 05:56
  • AND (url.website LIKE '%tomato%' OR url.website LIKE '%carrot%'), or use regex - use separate question for more details – Felipe Hoffa May 30 '19 at 05:57
  • https://docs.google.com/spreadsheets/d/1CNE__ikiHEQHedH0UiSPmRI1s47e7qEH_aJJVtYSSzU/edit?usp=sharing ooops sorry – Roasted Carrot May 30 '19 at 06:03
  • @FelipeHoffa Thank you! Watching these things work is sooo satisfying! – Roasted Carrot May 30 '19 at 06:04
  • I've found out a simple way to show the count, but as soon as I try to enter in a date field, the report does not count all of the same tomato/carrot URL's and just show distinct values. `SELECT url.website` `count(url.website) as count` `FROM datatable.report` `WHERE url.website like '%carrot%' OR url.website like '%tomato%'` `Group by url.website` The main gist is to add the report.timestamp in the select, so that I could plot by aggregated month. Any insight? Will continue to tinker around. – Roasted Carrot May 30 '19 at 07:53
  • Hello Everyone, I've asked a much simpler question which also answers this question: https://stackoverflow.com/questions/56382922/collate-language-codes-into-one-combined-locale-code-using-case-when-and-count Thanks for all the help! – Roasted Carrot May 30 '19 at 17:49
  • @RoastedCarrot - just answered both your questions! – Mikhail Berlyant May 30 '19 at 17:53

1 Answers1

2

Below is for BigQuery Standard SQL

#standardSQL
SELECT FORMAT_DATE('%b %Y', PARSE_DATE('%m/%d/%Y', dt)) month_year, 
  COUNTIF(url LIKE '%tomato%') tomato_views,
  COUNTIF(url LIKE '%carrot%') carrot_views,
  COUNTIF(NOT url LIKE '%tomato%' AND NOT url LIKE '%carrot%') other_views
FROM `project.dataset.table`
GROUP BY month_year  

You can test, play with above using sample/dummy data as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '1/1/2019' dt, 'www.websiteurl.com/tomato/page1' url UNION ALL
  SELECT '1/10/2019', 'www.websiteurl.com/tomato/page2' UNION ALL
  SELECT '1/3/2019', 'www.websiteurl.com/tomato/page3' UNION ALL
  SELECT '2/4/2019', 'www.websiteurl.com/tomato/page4' UNION ALL
  SELECT '2/21/2019', 'www.websiteurl.com/tomato/page5' UNION ALL
  SELECT '2/7/2019', 'www.websiteurl.com/tomato/page6' UNION ALL
  SELECT '3/7/2019', 'www.websiteurl.com/tomato/page7' UNION ALL
  SELECT '3/15/2019', 'www.websiteurl.com/tomato/page8' UNION ALL
  SELECT '3/29/2019', 'www.websiteurl.com/tomato/page9' UNION ALL
  SELECT '3/16/2019', 'www.websiteurl.com/tomato/page10' UNION ALL
  SELECT '1/11/2019', 'www.websiteurl.com/carrot/page1' UNION ALL
  SELECT '1/12/2019', 'www.websiteurl.com/carrot/page2' UNION ALL
  SELECT '4/10/2019', 'www.websiteurl.com/carrot/page3' UNION ALL
  SELECT '4/10/2019', 'www.websiteurl.com/carrot/page4' UNION ALL
  SELECT '4/18/2019', 'www.websiteurl.com/carrot/page5' UNION ALL
  SELECT '1/16/2019', 'www.websiteurl.com/carrot/page6' UNION ALL
  SELECT '1/17/2019', 'www.websiteurl.com/carrot/page7' UNION ALL
  SELECT '1/18/2019', 'www.websiteurl.com/turnip/home' UNION ALL
  SELECT '1/19/2019', 'www.websiteurl.com/turnip/resources' 
)
SELECT FORMAT_DATE('%b %Y', PARSE_DATE('%m/%d/%Y', dt)) month_year, 
  COUNTIF(url LIKE '%tomato%') tomato_views,
  COUNTIF(url LIKE '%carrot%') carrot_views,
  COUNTIF(NOT url LIKE '%tomato%' AND NOT url LIKE '%carrot%') other_views
FROM `project.dataset.table`
GROUP BY month_year   

with result

Row month_year  tomato_views    carrot_views    other_views  
1   Jan 2019    3               4               2    
2   Feb 2019    3               0               0    
3   Mar 2019    4               0               0    
4   Apr 2019    0               3               0    
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230