0

I am using a BI tool to view the data in Bigquery. The data is spread across multiple tables so I have several JOINS in the query to get the data in the format I require. Since it took a minute to process all these data and I would like a real-time kind of experience, I have created a scheduled query to run the query which does joins and saves the output to a table. The data in the tables are fed from an ETL tool every 30 minutes (Incremental load). This creates Bigquery Job to load the data.

I have one particular table, after the job for this table is done, I want to run the scheduled query.

I have disabled schedule in scheduled query and made it so it can run only during API call. I have written a python script that sends a API request to scheduled query.

Is there any way in python that monitors logs in real time for a particular Bigquery table so when the job status changes to 'Succeeded' for the particular table, I will send an API request to scheduled query to run the Scheduled query?

I saw the Stackdriver logging python code and it seems like I have to repeatedly make API requests to sort of simulate real-time monitoring. I can't seem to filter the results in API requests either for the particular table, I wrote some scripts to do that for me from the log results.

Is there any library that does it natively?

Jeeva Bharathi
  • 514
  • 4
  • 22
  • 1
    Let me understand your question better: You're creating a table with schedule query and then you want to make a query on the created table using the API? – rmesteves Jan 30 '20 at 10:37
  • I want to read the 'job completed' log from Stackdriver logging for the job as a confirmation to ensure the latest data is in Bigquery for a particular table then send an API request. – Jeeva Bharathi Jan 30 '20 at 10:48
  • For example: An advanced filter for a particular table would look like: resource.type="bigquery_resource" AND protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId="tableID" AND log_name="projects/projectID/logs/cloudaudit.googleapis.com%2Fdata_access" AND proto_payload.method_name="jobservice.jobcompleted". – Jeeva Bharathi Jan 30 '20 at 10:48
  • I want to read the logs in real-time for the table mentioned in advanced filter so when a newer log comes in, I can send API request to Bigquery Data transfer. @rmesteves – Jeeva Bharathi Jan 30 '20 at 10:48

2 Answers2

0

If the filter you mentioned is right, this is supposed to work:

from google.cloud import logging
from google.cloud.logging import DESCENDING

filter = 'resource.type="bigquery_resource" AND protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId="tableID" AND log_name="projects/projectID/logs/cloudaudit.googleapis.com%2Fdata_access" AND proto_payload.method_name="jobservice.jobcompleted'



for element in logging_client.list_entries(order_by=DESCENDING, filter_=filter): 
     .... YOUR LOGIC HERE ...

If you need a real time solution (some code that waits for the logs to arrive) you should implement yourself. The code above will bring all the logs related to your filter ordered by date descending. If you want to list only the last logs, you should also change your filter adding a timestamp filter.

If you have any questions feel free to ask me. I hope it helps

rmesteves
  • 3,870
  • 7
  • 23
0

Found a solution but it uses other Google Services.

Cloud Logging has a feature called Sink where we can direct the logs to Cloud Pub/Sub topic.

We can invoke a Cloud Functions from Cloud Pub/Sub.

Cloud Functions will have the python code to send a API request to Scheduled query.

Jeeva Bharathi
  • 514
  • 4
  • 22