GBQ (Google Big Query) provides views for streaming insert meta data, see STREAMING_TIMELINE_BY_*. I would like to use this data to understand the billing for "Streaming Inserts". However, the numbers don't add ab and I'd like to understand if I made a mistake somewhere.
One of the data points in the streaming insert meta data view is the total_input_bytes
:
total_input_bytes
INTEGER
Total number of bytes from all rows within the 1 minute interval.
In addition, the Pricing for data ingestion says:
Streaming inserts (tabledata.insertAll)
$0.010 per 200 MB
You are charged for rows that are successfully inserted. Individual rows are calculated using a 1 KB minimum size.
So getting the costs for streamining inserts per day should be possible via
0.01/200 * (SUM(total_input_bytes)/1024/1024)
cost per 200 mb -----^
total bytes in mb ---------------------^
This should be the lower boundary as we disregard any rows with less than 1KB that are rounded up to 1KB.
Full query:
SELECT
project_id,
dataset_id,
table_id,
SUM(total_rows) AS num_rows,
round(SUM(total_input_bytes)/1024/1024,2) AS num_bytes_in_mb,
# 0.01$ per 200MB
# @see https://cloud.google.com/bigquery/pricing#data_ingestion_pricing
round(0.01*(SUM(total_input_bytes)/1024/1024)/200, 2) AS cost_in_dollar,
SUM(total_requests) AS num_requests
FROM
`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
where
start_timestamp BETWEEN "2021-04-10" and "2021-04-14"
AND error_code IS NULL
GROUP BY 1, 2, 3
ORDER BY table_id asc
However, the results are not reflected in our actual billing report. The billing shows less than half the costs of what I would expect:
Now I wondering if the costs can even be calculated like this.