2

I am trying to filter out certain dates in my query. The line of code I am using in my query is as follows:

(date(date_add(date_time,-1,'year')) < date(date_add(current_date(),-1,'year')) ) OR year(date_time) = 2017)

The idea is to get all data YTD in 2017, as well as the data from 2016 that corresponds to those same days. So if I were to run a proper version of what I want, it would return all data YTD, as well as all 2016 data up until Nov. 16, 2016.

However, whenever I run the query with this in the WHERE clause, my query still returns dates that do not fit within my described parameters. I may be missing something completely obvious but this has been driving me nuts. Is there a particular way to filter dates that I'm missing?

Edit: Here is some of my data

[date]       [today]    [uniques]    
2016-01-01  2017-11-16  363878   
2016-01-02  2017-11-16  383813   
2016-01-03  2017-11-16  392579   
2016-01-04  2017-11-16  709367   
..
..   
2016-11-15  2017-11-16  841850   
2016-11-16  2017-11-16  847831   
2016-11-17  2017-11-16  797610   
2016-11-18  2017-11-16  187158   
2016-11-19  2017-11-16  521100   
..   
2017-11-12  2017-11-16  297604   
2017-11-13  2017-11-16  527858   
2017-11-14  2017-11-16  474051   
2017-11-15  2017-11-16  569686   

I added the today row for myself to make sure there wasn't a difference in date formatting. With this data, the bottom 3 rows in the middle section should not be returned, but they are. Everything else should fit the parameters I've set and get returned.

Appreciate any and all help. Thanks!

kshoe94
  • 389
  • 2
  • 4
  • 11
  • formulate in words what your expected logic is with regard of dates?! – Mikhail Berlyant Nov 16 '17 at 16:12
  • Sorry about that. I am trying to get all data from 2017 and the corresponding dates from 2016. So today I would get YTD data, as well as all 2016 data up to Nov. 16th. – kshoe94 Nov 16 '17 at 16:21
  • ok, would be great to provide some sample of your data - just some few rows and expected result - You can read [How to Ask](http://stackoverflow.com/help/how-to-ask) and show a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) – Mikhail Berlyant Nov 16 '17 at 16:24
  • I've edited again. Thanks for your help – kshoe94 Nov 16 '17 at 16:33
  • [date] and [today fields - are they string or date type? – Mikhail Berlyant Nov 16 '17 at 16:35
  • so I artificially created the today column just by adding current_date() to my select statement. date is a string that i cast to a date type. – kshoe94 Nov 16 '17 at 16:39
  • is [date] the same field as [date_time]? the query you presented does not match your data. please clarify – Mikhail Berlyant Nov 16 '17 at 16:42
  • sorry - in my select i write: select date(date_time) as date to convert to date object. date_time is the way it appears in the table i am selecting out of, which is of type string. when it gets returned with the name of 'date', it is of type date – kshoe94 Nov 16 '17 at 16:44
  • see my answer below - try it! – Mikhail Berlyant Nov 16 '17 at 18:25

1 Answers1

2

Below is for BigQuery Standard SQL:

#standardSQL
SELECT date_time, uniques 
FROM `project.dataset.yourTable`
WHERE (PARSE_DATE('%Y-%m-%d', date_time) < DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR)) 
OR EXTRACT(YEAR FROM PARSE_DATE('%Y-%m-%d', date_time)) = 2017  

You can test / play with it using dummy data from your question as it is below

#standardSQL
WITH `project.dataset.yourTable` AS (
  SELECT '2016-01-01' date_time, 363878 uniques UNION ALL
  SELECT '2016-01-02', 383813 UNION ALL
  SELECT '2016-01-03', 392579 UNION ALL
  SELECT '2016-01-04', 709367 UNION ALL

  SELECT '2016-11-15', 841850 UNION ALL
  SELECT '2016-11-16', 847831 UNION ALL
  SELECT '2016-11-17', 797610 UNION ALL
  SELECT '2016-11-18', 187158 UNION ALL
  SELECT '2016-11-19', 521100 UNION ALL

  SELECT '2017-11-12', 297604 UNION ALL
  SELECT '2017-11-13', 527858 UNION ALL
  SELECT '2017-11-14', 474051 UNION ALL
  SELECT '2017-11-15', 569686 
)
SELECT date_time, uniques 
FROM `project.dataset.yourTable`
WHERE (PARSE_DATE('%Y-%m-%d', date_time) < DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR)) 
OR EXTRACT(YEAR FROM PARSE_DATE('%Y-%m-%d', date_time)) = 2017  

It is highly recommended to use BigQuery Standard SQL - so if you are still using Legacy SQL - consider Migrating to Standard SQL

Meantime if you have your code in legacy sql (which is look like to me) - you can use below

#legacySQL
SELECT date_time, uniques 
FROM [project:dataset.yourTable]
WHERE (DATE(date_time) < DATE(DATE_ADD(CURRENT_DATE(), -1, 'year'))) 
OR YEAR(date_time) = 2017  

You can test / play with dummy data as below

#legacySQL
SELECT date_time, uniques 
FROM --[project:dataset.yourTable]
  (SELECT '2016-01-01' date_time, 363878 uniques),
  (SELECT '2016-01-02' date_time, 383813 uniques),
  (SELECT '2016-01-03' date_time, 392579 uniques),
  (SELECT '2016-01-04' date_time, 709367 uniques),

  (SELECT '2016-11-15' date_time, 841850 uniques),
  (SELECT '2016-11-16' date_time, 847831 uniques),
  (SELECT '2016-11-17' date_time, 797610 uniques),
  (SELECT '2016-11-18' date_time, 187158 uniques),
  (SELECT '2016-11-19' date_time, 521100 uniques),

  (SELECT '2017-11-12' date_time, 297604 uniques),
  (SELECT '2017-11-13' date_time, 527858 uniques),
  (SELECT '2017-11-14' date_time, 474051 uniques),
  (SELECT '2017-11-15' date_time, 569686 uniques)
WHERE (DATE(date_time) < DATE(DATE_ADD(CURRENT_DATE(), -1, 'year'))) 
OR YEAR(date_time) = 2017
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Thank you for your help! turns out my logic wasn't wrong, but the person who's query i was trying to help fix had their filters in CNF and I did not realize that one of his 'OR' statements was outside the parentheses, which caused the other dates to show up where that condition was met. – kshoe94 Nov 16 '17 at 19:27
  • i recommend you to check answer again! based on your question - your initial expression when put in WHERE clause was wrong! Compare it with expression in the answer - in legacy version - so you compare apple to apple :o) – Mikhail Berlyant Nov 16 '17 at 19:31