To get complete, consistent query results from billions of rows, you'll use a distributed file system connected to multiple Drillbits, simulate a distributed file system by copying files to each node, or use an NFS volume, such as Amazon Elastic File System. Drill performs performant querying of big data using a number of techniques, including these:
- Relies on the cluster nodes to handle failures (doesn't spend time on failure-related tasks).
- Uses an in-memory data model that's hierarchical and columnar (doesn't access the disk for columns that are not involved in an analytic query, processing the columnar data without row materialization).
- Uses columnar storage optimizations and execution (keeps memory footprint low).
- Uses vectorization to work on arrays of values from different records rather than single values from one record at a time.
For more information, see http://drill.apache.org/docs/performance/.