I am currently sending StackDriver log files for my app to a BigQuery table. I would like to strip down the dataset and place it into a new BigQuery table to be queried later and render those results in a view on my app. I will be using python as my main language as I do not know Java, and creating a CRON job to run this script every 15 minutes to populate the new log dataset from StackDriver.
Striping down the dataset takes on two processes: 1.) Only write some of the columns from the original BigQuery table to the new one 2.) Create a subset of the data in certains columns to be written into new columns in the new BigQuery table. For example:
A row in the original BigQuery table will contain the string
Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3
I would like to strip out iPad
and place this into a devices column, AppleWebKit
and place this into a browsers column, etc, in the new BigQuery table.
I know I can load the bigquery libraries into python to query the orignal BigQuery table but how do I strip out what I want and write that to a new table? Would this be a good use case for pandas? Is there an easier way to accomplish this task then my current idea?