I've read almost all the threads about how to improve BigQuery performance, to retrieve data in milliseconds or at least under a second. I decided to use BI Engine for the purpose because it has seamless integration without code changes, it supports partitioning, smart offloading, real-time data, built-in compression, low latency, etc.
Unfortunately for the same query, I got a slower response time with the BI engine enabled, than just the query with cache enabled.
- BigQuery with cache hit
Average 691ms response time from BigQuery API
https://gist.github.com/bgizdov/b96c6c3d795f5f14e5e9a3e9d7091d85
- BigQuery + BiEngine
Average 1605ms response time from BigQuery API.
finalExecutionDurationMs is about 200-300ms, but the total time to retrieve the data (just 8 rows) is 5-6 times more.
BigQuery UI: Elapsed 766ms, the actual time for their call to REST entity service is 1.50s. This explains why I get similar results.
https://gist.github.com/bgizdov/fcabcbce9f96cf7dc618298b2d69575d
I am using Quarkus with BigQuery integration and measuring the time for the query with Stopwatch by Guava.
The table is about 350MB, the BI reservation is 1GB. The returned rows are 8, aggregated from 300 rows. This is a very small data size with a simple query. I know BigQuery does not perform well with small data sizes, or it doesn't matter, but I want to get data for under a second, that's why I tried BI, and it will not improve with big datasets.