I have a date partitioned table (call it sample_table
) with 2 columns, one to save dateTime in UTC and other to save timezone offset. I have a view on top of this table (call it sample_view
). The view takes _partitiontime
in from table and exposes that as partitionDate
column and also there is another column customerDateTime
which simply adds dateTime with timeZoneOffset.
When I query the sample_table
directly using only _partitiontime
bigquery scans far less data (131 MB).
select
containerName,
count(*)
from
[sample_project.sample_table]
where
_partitiontime between timestamp('2016-12-12') and timestamp('2016-12-19')
and customer = 'X'
and containerName = 'XXX'
group by containerName
;
But when I run same query on the table with dateTime
column to scan according to customer's local date time big query scans more (211MB). I expected less than 131MB or equal to 131MB.
select
containerName,
count(*)
from
[sample_project.sample_table]
where
_partitiontime between timestamp('2016-12-12') and timestamp('2016-12-19')
and DATE_ADD(dateTime, 3600, 'SECOND' ) between timestamp('2016-12-12 08:00:00') and timestamp('2016-12-19 15:00:00')
and customer = 'X'
and containerName = 'XXX'
group by containerName
;
When I run similar query against the sample_view
with partitionDate
bigquery scans more (399MB)
select
containerName,
count(*)
from
[sample_project.sample_view]
where
partitionDate between timestamp('2016-12-12') and timestamp('2016-12-19')
and customer = 'X'
and containerName = 'XXX'
group by containerName
;
And when I run query against the view with partitionDate
and use customerDateTime
column as well bigquery scans even more (879MB)
select
containerName,
count(*)
from
[sample_project.sample_view]
where
partitionDate between timestamp('2016-12-12') and timestamp('2016-12-19') and customerDateTime between timestamp('2016-12-12 08:00:00') and timestamp('2016-12-19 15:00:00')
and customer = 'X'
and containerName = 'XXX'
group by containerName
;
I'm not too sure whether I'm scanning right partitions from any of the queries above. Why do I see the differences between these queries? Is exposing _partitiontime as a new column partitionDate
a bad strategy? I'm not sure how else to use the partition date within Tableau without writing more queries. Please let me know if you require more details.