We wrote a query that calculates the amount of time it takes to route an email from Gmail, to a third party security service, then return to Gmail. Now we want to graph it in DataStudio, but the way it's written requires specifying the day in two locations in order to reduce the number of partitions that are being queried, however most systems like DataStudio can only work with a single field for time ranges. How else can I write this so a single field is utilized for time range?
SELECT
datetime_diff(timestamp_after, timestamp_before, SECOND) as delay,
timestamp_before,
timestamp_after,
sender_before as sender,
recipient_before as recipient,
message_id_before as message_id,
subject_before as subject,
spf_pass_before,
spf_pass_after,
pt_before,
pt_after
FROM(
SELECT
_TABLE_SUFFIX as pt_after,
DATETIME(timestamp_micros(event_info. timestamp_usec), "America/New_York") as timestamp_after,
message_info.rfc2822_message_id as message_id_after,
message_info.connection_info.spf_pass as spf_pass_after,
message_info.source.address as sender_after,
message_info.subject as subject_after,
dest.address as recipient_after,
rule.rule_name as rule_name_after
FROM
`g-suite-logs.gmail_logs.daily_*`,
UNNEST ( message_info.destination ) as dest,
UNNEST ( message_info.triggered_rule_info ) as rule
WHERE rule.rule_name = "AFTER RETURNING FROM THIRD PARTY SYSTEM"
GROUP BY
pt_after,
message_id_after,
timestamp_after,
spf_pass_after,
sender_after,
recipient_after,
rule_name_after,
subject_after
) rule_after
JOIN(
SELECT
_TABLE_SUFFIX as pt_before,
DATETIME(timestamp_micros(event_info. timestamp_usec), "America/New_York") as timestamp_before,
message_info.rfc2822_message_id as message_id_before,
message_info.connection_info.spf_pass as spf_pass_before,
message_info.source.address as sender_before,
message_info.subject as subject_before,
dest.address as recipient_before,
rule.rule_name as rule_name_before
FROM
`g-suite-logs.gmail_logs.daily_*`,
UNNEST ( message_info.destination ) as dest,
UNNEST ( message_info.triggered_rule_info ) as rule
WHERE rule.rule_name = "BEFORE ROUTING TO THIRD PARTY SYSTEM"
GROUP BY
pt_before,
message_id_before,
timestamp_before,
spf_pass_before,
sender_before,
recipient_before,
rule_name_before,
subject_before
) rule_before
ON
rule_before.message_id_before = rule_after.message_id_after AND recipient_before = recipient_after
I can save this as a view and set 'WHERE pt_before = "20190618" AND pt_after = "20190618"', and it significantly reduces the cost of the query (from 1.5tb to 24gb), but then I can't plug the view into DataStudio easily because there's two date fields that have to be used.
One option is to use parameterized queries, but I don't think DataStudio supports those.
On a separate topic, this is how I had originally written the query, which seems more efficient, however, I was finding a lot of false positives with that query where it matched one rule but not the other and recorded a timediff of 0, skewing the results. So, if anyone has suggestions on a more efficient way to write this, I'm open to suggestions.