My goal is to test if the grp's generated by one query, are the same grp's as the output of the same query. However, when I change a single variable name, I get different results.
Below I show an example of the same query where we know the results are the same. However, if you run this group, you will find one query produces different results than another.
SELECT grp
FROM
(
SELECT CONCAT(word, corpus) AS grp, rank1, rank2
FROM (
SELECT
word, corpus,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY test1 DESC) AS rank1,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY word_count DESC) AS rank2,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus DESC) AS rank3,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus_date DESC) AS rank4
FROM
(
SELECT *, (word_count * word_count * corpus_date) AS test1
FROM [bigquery-public-data:samples.shakespeare]
)
)
)
WHERE rank1 <= 3 OR rank2 <= 3
HAVING grp NOT IN
(
SELECT grp FROM (
SELECT CONCAT(word, corpus) AS grp, rank1, rank2
FROM
(
SELECT
word, corpus,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY test2 DESC) AS rank1,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY word_count DESC) AS rank2,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus DESC) AS rank3,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus_date DESC) AS rank4
FROM
(
SELECT *, (word_count * word_count * corpus_date) AS test2
FROM [bigquery-public-data:samples.shakespeare]
)
)
)
WHERE rank1 <= 3 OR rank2 <= 3
)
Far worse... now if you try running the exact same query, but simply change the variable name test1 to test3, you will get completely different results.
SELECT grp
FROM
(
SELECT CONCAT(word, corpus) AS grp, rank1, rank2
FROM (
SELECT
word, corpus,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY test3 DESC) AS rank1,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY word_count DESC) AS rank2,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus DESC) AS rank3,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus_date DESC) AS rank4
FROM
(
SELECT *, (word_count * word_count * corpus_date) AS test3
FROM [bigquery-public-data:samples.shakespeare]
)
)
)
WHERE rank1 <= 3 OR rank2 <= 3
HAVING grp NOT IN
(
SELECT grp FROM (
SELECT CONCAT(word, corpus) AS grp, rank1, rank2
FROM
(
SELECT
word, corpus,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY test2 DESC) AS rank1,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY word_count DESC) AS rank2,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus DESC) AS rank3,
ROW_NUMBER() OVER (PARTITION BY word ORDER BY corpus_date DESC) AS rank4
FROM
(
SELECT *, (word_count * word_count * corpus_date) AS test2
FROM [bigquery-public-data:samples.shakespeare]
)
)
)
WHERE rank1 <= 3 OR rank2 <= 3
)
I can think of no explanation that satisfies both of these bizarre behaviors and this is preventing me from being able to validate my data. Any ideas?
EDIT:
I've updated the BigQuery SQL in the way the responses would suggest, and the same inconsistencies occur.