0

We wrote a query that calculates the amount of time it takes to route an email from Gmail, to a third party security service, then return to Gmail. Now we want to graph it in DataStudio, but the way it's written requires specifying the day in two locations in order to reduce the number of partitions that are being queried, however most systems like DataStudio can only work with a single field for time ranges. How else can I write this so a single field is utilized for time range?

SELECT
datetime_diff(timestamp_after, timestamp_before, SECOND) as delay,
timestamp_before,
timestamp_after,
sender_before as sender,
recipient_before as recipient,
message_id_before as message_id,
subject_before as subject,
spf_pass_before,
spf_pass_after,
pt_before,
pt_after
FROM(
  SELECT
    _TABLE_SUFFIX as pt_after,
    DATETIME(timestamp_micros(event_info. timestamp_usec), "America/New_York") as timestamp_after,
    message_info.rfc2822_message_id as message_id_after,
    message_info.connection_info.spf_pass as spf_pass_after,
    message_info.source.address as sender_after,
    message_info.subject as subject_after,
    dest.address as recipient_after,
    rule.rule_name as rule_name_after
  FROM
    `g-suite-logs.gmail_logs.daily_*`,
    UNNEST ( message_info.destination ) as dest,
    UNNEST ( message_info.triggered_rule_info ) as rule
   WHERE rule.rule_name = "AFTER RETURNING FROM THIRD PARTY SYSTEM"
  GROUP BY
    pt_after,
    message_id_after,
    timestamp_after,
    spf_pass_after,
    sender_after,
    recipient_after,
    rule_name_after,
    subject_after
  ) rule_after
JOIN(
  SELECT
    _TABLE_SUFFIX as pt_before,
    DATETIME(timestamp_micros(event_info. timestamp_usec), "America/New_York") as timestamp_before,
    message_info.rfc2822_message_id as message_id_before,
    message_info.connection_info.spf_pass as spf_pass_before,
    message_info.source.address as sender_before,
    message_info.subject as subject_before,
    dest.address as recipient_before,
    rule.rule_name as rule_name_before
  FROM
    `g-suite-logs.gmail_logs.daily_*`,
    UNNEST ( message_info.destination ) as dest,
    UNNEST ( message_info.triggered_rule_info ) as rule
  WHERE rule.rule_name = "BEFORE ROUTING TO THIRD PARTY SYSTEM"
  GROUP BY
    pt_before, 
    message_id_before,
    timestamp_before,
    spf_pass_before,
    sender_before,
    recipient_before,
    rule_name_before,
    subject_before
  ) rule_before
ON
  rule_before.message_id_before = rule_after.message_id_after AND recipient_before = recipient_after

I can save this as a view and set 'WHERE pt_before = "20190618" AND pt_after = "20190618"', and it significantly reduces the cost of the query (from 1.5tb to 24gb), but then I can't plug the view into DataStudio easily because there's two date fields that have to be used.

One option is to use parameterized queries, but I don't think DataStudio supports those.

On a separate topic, this is how I had originally written the query, which seems more efficient, however, I was finding a lot of false positives with that query where it matched one rule but not the other and recorded a timediff of 0, skewing the results. So, if anyone has suggestions on a more efficient way to write this, I'm open to suggestions.

Michael
  • 1,428
  • 3
  • 15
  • 34
  • I have similar date range views which are re-created every time the pipeline runs, dates are passed as parameter to a python script which substitutes the variable. – Fact Jun 19 '19 at 00:30
  • But how do you factor a python script into a DataStudio chart? – Michael Jun 19 '19 at 01:28
  • 1
    If your dates are pretty consistent for example " between current date -1 " and "current '' you may create the view like wise ,, or consider using monthly view(views having a month's data) etc if it suits you. – Fact Jun 20 '19 at 00:30
  • Haven't tested this but Google recently announced support for parameterized queries in DataStudio. When I get some time to come back to this will give it a shot: https://www.blog.google/products/marketingplatform/analytics/introducing-bigquery-parameters-data-studio/ – Michael Aug 23 '19 at 03:31

1 Answers1

1

maybe you can use an approach like this

with

gmail_logs as (
    select distinct
        _table_suffix as pt,
        datetime(timestamp_micros(event_info.timestamp_usec), "America/New_York") as timestamp,
        message_info.rfc2822_message_id as message_id,
        message_info.connection_info.spf_pass,
        message_info.source.address as sender,
        message_info.subject,
        dest.address as recipient,
        rule.rule_name
    from
        `g-suite-logs.gmail_logs.daily_*` as gl
        cross join unnest(gl.message_info.destination) as dest
        cross join unnest(gl.message_info.triggered_rule_info) as rule
    where
        rule.rule_name in ( 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
                            'BEFORE ROUTING TO THIRD PARTY SYSTEM')
        and _table_suffix = '20190618'
)

select
    message_id,
    recipient,

    datetime_diff(
        max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
                timestamp, null)),
        max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
                timestamp, null)),
        second) as delay,

    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            timestamp, null)) as timestamp_before,
    max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
            timestamp, null)) as timestamp_after,

    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            sender, null)) as sender,
    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            subject, null)) as subject,

    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            spf_pass, null)) as spf_pass_before,
    max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
            spf_pass, null)) as spf_pass_after,

    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            pt, null)) as pt_before,
    max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
            pt, null)) as pt_after
from
    gmail_logs
group by
    1, 2

Y.K.
  • 682
  • 4
  • 10