5

I currently have a query merging two tables to create a new one for analysis. After getting some funny results when trying to chart it for presentation, I learned that some of it is fake data that was never cleaned up. I've been able to identify the data causing the problems and, for the sake of time, would like to exclude it within the query so I can move ahead with the analysis.

This fake data matches ALL these criteria:

  • rate_type = Standard
  • client_net_cleared = 0
  • program is blank (not Null)

I identified these in SELECT with a CASE statement, but realized that to make any use of that I'd have to do another table querying everything in this one minus what's identified as meeting the above criteria based on the CASE statement. There has to be a better solution than that.

I'm currently trying to exclude these as part of the WHERE statement, but read around other question topics and found out WHERE is not very good at managing multiple sub-conditions.

What I have:

SELECT *
, CASE WHEN tad.rate_type = 'Standard'
    AND tad.client_net_cleared = '0'
    AND program= '' THEN 1
    ELSE '0'
    END AS noise

FROM tableau.km_tv_ad_data_import tad
JOIN tableau.km_tv_ad_report ga
    ON ga.session_timestamp >= tad.timestamp - INTERVAL '4 minute'
    AND ga.session_timestamp <= tad.timestamp + INTERVAL '5 minute'
    AND ga.session_timestamp != tad.timestamp

WHERE tad.timestamp >= '2016-09-01'
AND (tad.rate_type != 'Standard'
    AND tad.client_net_cleared != '0'
    AND tad.program != '')

GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21

Sample data set:

 timestamp           | rate_type | program         | client_net_cleared | noise
---------------------|-----------|-----------------|--------------------|-------
 2016-11-01 18:00:00 | Standard  | Diving          |                 50 | 0
 2016-12-01 21:00:00 | Holiday   | Classic Albums  |                100 | 0
 2016-11-01 09:00:00 | FireSale  | Panorama        |                  0 | 0
 2016-10-01 12:00:00 | Standard  |                 |                  0 | 1
 2016-12-01 15:00:00 | Holiday   | MythBusters     |                100 | 0
 2016-10-01 13:00:00 | FireSale  | House           |                200 | 0

What I need:

Exclude rows matching ALL three criteria: rate_type = Standard, client_net_cleared = 0, program is blank (not Null).

Minadorae
  • 301
  • 3
  • 5
  • 13
  • Did u tried the nested table, like the next:- `select * from ( -- Put your query here ) a where rate_type = Standard and client_net_cleared = 0 and program is not Null` – ahmed abdelqader Jan 06 '17 at 19:53
  • You can't group by ordinal position (at least not in sql-server). And if you can in mysql it is a habit you should stop immediately if not sooner. – Sean Lange Jan 06 '17 at 19:56
  • Especially when using `SELECT *`. That makes it dependent on the order of the columns in the `CREATE TABLE` statement. I have a feeling that he's actually grouping by all the columns, so it should just be `SELECT DISTINCT *`. – Barmar Jan 06 '17 at 20:01
  • @SeanLange What would be a better way to group by? I'd love to use a shorter or more flexible way of grouping if there is one! – Minadorae Jan 06 '17 at 20:09
  • A better way would be to name the columns. If you change your table (or heaven forbid actually name the columns instead of using *) and you group by ordinal position your query is broken. And fixing it becomes a nightmare. – Sean Lange Jan 06 '17 at 20:15
  • Good point, if the original table changes that will cause problems. The columns do have names, I can use these instead of their ordinal position going forward. – Minadorae Jan 06 '17 at 20:22

2 Answers2

19

The correct criteria is

AND NOT (tad.rate_type = 'Standard'
        AND tad.client_net_cleared = '0'
        AND tad.program = '')

By deMorgan's Law, this would be equivalent to:

AND (tad.rate_type != 'Standard'
    OR tad.client_net_cleared != '0'
    OR tad.program != '')

This is like your query, except notice that it uses OR, not AND.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • But OP stated where all three criteria were met... your second statement negates this since only one needs to equate true, or am i missing something? – S3S Jan 06 '17 at 19:53
  • @scsimon Notice that I also negated the criteria. That's deMorgan's Law: `NOT (x AND y)` is equivalent to `(NOT x OR NOT y)` – Barmar Jan 06 '17 at 19:57
  • 1
    nevermind, i understand now. thanks for the schooling – S3S Jan 06 '17 at 20:04
  • @Barmar This worked, thank you! And now I've learned about deMorgan's Law, too. – Minadorae Jan 06 '17 at 20:14
0

You can also do SELECTs in the WHERE clause to exclude rows using NOT IN. For example all the qualifications with one vendor and the not in excludes people with qualifications with other vendors:

select * from qualification q
inner join certification c on c.id = q.certificationid
    where  c.vendorid = 3 and 
    employeeid not in 
    (    
        select employeeid from qualification q
        inner join
        certification c on c.id = q.certificationid
        where c.vendorid <> 3
    )  
 
Jeremy Thompson
  • 61,933
  • 36
  • 195
  • 321