1

Though the same question has been asked before here but no accepted answer is seen: GCE RAM and CPU usage in BigQuery

The question is simple, for every successful query executed in BigQuery, how do I know how much computing resource was consumed? how many CPU cores, RAM were used? And how the query was charged?

An example below is from a Job execution on GCP:

enter image description here

Thank you.

[Update]

Some SOers suggested dry-run, I tried

bq query "SELECT COUNT(1) as rowcount, COUNTIF(corpus = 'hamlet') as rowcount_hamlet FROM publicdata.samples.shakespeare order by rowcount_hamlet desc" --dry-run=true 

bq query "SELECT COUNT(1) as rowcount, COUNTIF(corpus = 'hamlet') as rowcount_hamlet FROM publicdata.samples.shakespeare order by rowcount_hamlet desc" dry-run=true 

Both are not working for me. Can anyone advise the right format please?

mdivk
  • 3,545
  • 8
  • 53
  • 91

2 Answers2

1

That looks like a screenshot of Cloud Dataflow metrics, not BigQuery. BigQuery is a fully-managed service. That means all the infrastructure running your queries is pretty much abstracted away from you. What you can analysis however, is the number of slots used per query and the query execution plan to get an idea of what kind os resources are being consumed to run your queries. You need to drill in on the [slot][2] metric. From memory, a slot is roughly half of a CPU core and about 1 GB of RAM.

Graham Polley
  • 14,393
  • 4
  • 44
  • 80
  • Thank you, the Execution details seems only available AFTER the query is executed already, correct me if I am wrong. I know I can check the plan in advance in Cloudera Hive/Impala, does BigQuery provide that somewhere in advance as well? Can you post a screenshot if yes? Beside, If I execute a query in BigQuery, may I assume there is actually a job in the back-end? if yes, where can I see the job's id? I am asking because I might need to retrieve query plan info by the job id as per this doc: `https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get`, thank you Graham. – mdivk Jan 13 '20 at 02:04
  • _"for every successful query executed in BigQuery"_ - I thought you needed the stats for the queries you've already executed? If you want to check what the query cost would be but without running it, you can use `dry-run=true`. But, I'm not aware of anything that gives you more info on the query before running it (I could be wrong). – Graham Polley Jan 13 '20 at 02:14
  • Thank you Graham, I just tried to set dry-run=true, both `253 bq query "SELECT COUNT(1) as rowcount, COUNTIF(corpus = 'hamlet') as rowcount_hamlet FROM publicdata.samples.shakespeare order by rowcount_hamlet desc" --dry-run=true` and `254 bq query "SELECT COUNT(1) as rowcount, COUNTIF(corpus = 'hamlet') as rowcount_hamlet FROM publicdata.samples.shakespeare order by rowcount_hamlet desc" dry-run=true` are not working for me. Can you advise the right format please? Thank you. – mdivk Jan 13 '20 at 15:54
1

To price your queries before running them you must use the --dry_run flag in the CLI and the dryRun parameter in the job configuration when you use the API or client libraries, then use the Google Cloud Platform Pricing Calculator to estimate the cost.

Here is an example query from the documentation:

bq query \
--use_legacy_sql=false \
--dry_run \
'SELECT
   COUNTRY,
   AIRPORT,
   IATA
 FROM
   `project_id`.dataset.airports
 LIMIT
   1000'


I suggest you to read through the BigQuery best practices to control costs.

Ksign
  • 779
  • 5
  • 11
  • Thank you for the Calculator, that is very nice, however, forgive my silly question here: for Big Query, seems everyone can use it without even creating a project if it is just for practicing on the public dataset, meaning it is free in that case? – mdivk Jan 15 '20 at 13:50
  • Yes, the pricing calculator is free and you don't need to create a project to use it. When querying on the [public dataset](https://cloud.google.com/bigquery/public-data/), the first 1TB per month is free, subject to [query pricing details](https://cloud.google.com/bigquery/pricing#queries). – Ksign Jan 15 '20 at 14:32
  • You are welcome. Please remind to accept the answer if it solved your question so it can help the community in future cases. – Ksign Jan 15 '20 at 14:41
  • Thanks for the reminding, I am waiting to see if anyone can share a working query with dry-run, I mentioned that in a comment to Graham, should have udpated it in the OP. – mdivk Jan 15 '20 at 15:35
  • It is `--dry_run` with an underscore. And you just need to add the flag, don't set it to True or False. I will edit my answer to add the example query provided in the documentation. – Ksign Jan 15 '20 at 15:37
  • Thank you very much, can you please add your correction to your answer? btw, dry_run shows only how much data, no others like CPU/RAM? – mdivk Jan 15 '20 at 15:44
  • No, it doesn't show CPU/RAM. To get an idea of that, refer to the answer from @Graham. – Ksign Jan 15 '20 at 15:58