0

I have been trying to understand the limitations of BigQuery based on how it works under the hood, but it's hard for me to find an answer to the question in the title: Can BigQuery queries get assigned to a different root nodes?

I currently work with Redshift where there is only one cluster with one leader node. Everything goes through and is put together by the leader node. If the leader node does not have enough compute or storage, that means lower concurrency capabilities. Leader node is a bottleneck.

What I am trying to understand is, do we have a similar problem in BigQuery? I know we have a root node akin to the leader node in Redshift when a query is run. See picture below:

enter image description here

Google claims you can practically never run out of resources, which implies that BigQuery doesn't have the same issue as Redshift, but there is no clear explanation. Can the Dremel node tree depicted above be spun up multiple times based on how much compute you require? Could the root node be a bottleneck?

MattSt
  • 1,024
  • 2
  • 16
  • 35
  • I think, there is limitations. Maybe higher that you know on Redshift, but there is limitation. Nothing is magic! but the most magic thing is that with BigQuery you don't have to worry about that, it's BigQuery/Google purpose to optimise and size the cluster/number of root node as required. You simply use the solution, it works very well and that's all that you need! (Or explain your problem, limits, observations, to help you to go over!) – guillaume blaquiere Jan 24 '23 at 14:31
  • Thanks for the response. However, I know there are some limitations. For example, that the SELECT result can't be more than 10GB. However, I am not asking about whether BigQuery has limitations in general, so if you know a specific limitation related to my question please explain. I am trying to understand BigQuery under the hood. Specifically, does it, or does it not spin up extra trees like those depicted in the picture when new queries arrive? – MattSt Jan 25 '23 at 10:29

1 Answers1

0

BigQuery architecturally works different than Redshift. Whenever a query is executed the work gets parallelly processed in slots by BQ. There is no master node which controls the resources however, it has an internal scheduler which allots the resources automatically. We don't have to worry about the resources being allocated since BQ handles it under the hood. That is why you can never run out of resources.

There are however scenarios where the waiting time (in the job execution) of a query might take long, in those cases the best way is to buy additional BQ slots for your query execution.

@Jordan Tigani explained the dremel architecture and query execution process of BigQuery here which you can refer for more information.

Sakshi Gatyan
  • 1,903
  • 7
  • 13
  • Thank you for your answer! While the video explains a lot about how Dremel querying works and how slots work in BigQuery it is a long video and I did not find it answering the original question. Also I find the statement "there is no master node" quite confusing when there is obviously a "root node". Saying that it is being handled under the hood is quite ambiguous and does not under the question either. Is it one root node executing all queries or do we use a different root node per query? There is obviously a root node which passes the query commands down to intermediate and leaf nodes. – MattSt Feb 02 '23 at 10:36