What's the problem?
I'm trying to select two different date periods at once within a query (to be used in Data Studio), to do some complex period comparison calculations.
This has to all happen in the query, because it's getting used in Data Studio (and other reporting platforms).
However the logic I'm using to prune the partitions, seems to be inconsistent on the BigQuery side.
One of them prunes partitions correctly, the other doesn't. Some times variations seem to use arbitrary amounts of data.
Examples of partition pruning
Here's the first example. It either selects the correct amount ~ 12MB. Or the entire table ~5GB.
SELECT
columns...
FROM
table
WHERE
(
date(_PARTITIONTIME) >= PARSE_DATE('%Y%m%d', "20220509")
AND date(_PARTITIONTIME) <= PARSE_DATE('%Y%m%d', "20220509")
)
OR
(
date(_PARTITIONTIME) >= DATE_SUB(PARSE_DATE('%Y%m%d', "20220509"), INTERVAL 1 DAY)
AND date(_PARTITIONTIME) <= DATE_SUB(PARSE_DATE('%Y%m%d', "20220509"), INTERVAL 1 DAY)
)
AND REGEXP_CONTAINS (path, r"/c/")
I can't consistently recreate the 12MB version, it's more typically returning the 5GB version.
Question 1: I'm assuming this is some sort of weird cache thing, where it's somehow run the date calculation before?
Question 2: I'm also not sure why it's not pruning the tables here as I'm not doing any subqueries, just some date calculations. Why is it pruning?