How does Apache Drill handle big result sets?

Question

Let's say you have Drill connected to two separate databases, and you run a query where you would pull a massive amount of data from each and then do a join.

How does Drill handle this without throwing Out of Memory errors? This is assuming that the data you are requesting exceeds the amount of memory available for Drill to use.

I don't think there is any. There are some properties which help to a level. generally drill throws OutOfMemory very frequently when we handle large set of data. i'm also curious to know the answers. — anji_rajesh, Sep 28 '16 at 03:34

score 0 · Answer 1 · answered Oct 11 '16 at 15:56

Please check the below from Drill documentation .

Drill scales from a single laptop to a 1000-node cluster

Drill is available as a simple download you can run on your laptop. When you're ready to analyze larger datasets, deploy Drill on your Hadoop cluster (up to 1000 commodity servers). Drill leverages the aggregate memory in the cluster to execute queries using an optimistic pipelined model, and automatically spills to disk when the working set doesn't fit in memory.

How does Apache Drill handle big result sets?

1 Answers1