Questions tagged [apache-drill]

Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data.It is capable of querying nested data in formats like JSON and Parquet and performing dynamic schema discovery.

Drill is an Apache open-source SQL query engine for Big Data exploration. Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data coming from modern Big Data applications, while still providing the familiarity and ecosystem of ANSI SQL, the industry-standard query language. Drill provides plug-and-play integration with existing Apache Hive and Apache HBase deployments.

Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores.

Recommended reference sources:

644 questions
5
votes
1 answer

apache-drill-1.12.0 "Failure in starting embedded Drillbit" and "no current connection error" (Windows 10)

I am using apache-drill-1.12.0 on Windows 10. I get "no current connection" errors when sending any queries. Also, drill web console which should be working on localhost:8047 is not working. I have searched many answers on StackOverflow said about…
Dinesh Sonachalam
  • 1,223
  • 19
  • 33
5
votes
1 answer

Error starting Apache Drill in Embedded Mode on Windows 10

I am trying to start Apache Drill 1.10 in Embedded Mode on Windows 10 x64 (with Oracle JVM 1.8.0_131). When launching the command sqlline.bat -u "jdbc:drill:zk=local" I get the following: Error during udf area creation…
Toon64
  • 51
  • 4
5
votes
2 answers

Apache Drill query HBase table

I am using drill-embedded to execute SQL, I can see the tables in HBase. Here is the terminal output.. But, I'm not able to perform query on them, it is the raising the following error: 0: jdbc:drill:zk=local> SELECT * FROM students; Error:…
5
votes
1 answer

Issue with join on SQL Server tables with same plugin and same datatype

I want to join two table using same storage plugin. But One Of the Column showing null value. I am using this query:- select * from SqlServer.test_mopslive.reports as Reports join SqlServer.test_mopslive.reportsetting as ReportSetting on…
Sanjiv
  • 980
  • 2
  • 11
  • 29
5
votes
2 answers

Use directories for partition pruning in Spark SQL

I have data files (json in this example but could also be avro) written in a directory structure like: dataroot +-- year=2015 +-- month=06 +-- day=01 +-- data1.json +-- data2.json +-- data3.json …
Lundahl
  • 6,434
  • 1
  • 35
  • 37
4
votes
2 answers

Configuring Apache drill for Cassandra

I am trying to configure Cassandra with Drill. I used the same approach given on the link: https://drill.apache.org/docs/starting-the-web-ui/. I used the following code for New Storage Plugin: { "type": "cassandra", "hosts": [ "127.0.0.1" …
4
votes
1 answer

How to configure Apache Drill options as env variable?

I would like to set drill.exec.hashjoin.fallback.enabled as true in system level by starting drillbit. I can set it during my session like alter session setdrill.exec.hashjoin.fallback.enabled=TRUE;, also I am aware of drill-override.conf file.…
4
votes
2 answers

Java 1.7 or later is required to run Apache Drill

When I type $ drillbit.sh start it shows me this error: ERROR: Java 1.7 or later is required to run Apache Drill. although I have the latest version of java $ java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build…
yakout
  • 782
  • 3
  • 9
  • 24
4
votes
2 answers

Extremely slow Apache Drill Query using Oracle jdbc

I've successfully set up Apache Drill (latest 1.9) with the Oracle's JDBC client (latest ojdbc7.jar from oracle) as a storage plugin: { "type": "jdbc", "driver": "oracle.jdbc.driver.OracleDriver", "url":…
Thomas B.
  • 2,276
  • 15
  • 24
4
votes
1 answer

Why is Drill join query not fully optimized for Mongo DB?

I am working on proof of concept to optimize the performance of the join queries executed through drill. The underlying storage is a NO-SQL based database - Mongo DB. The time it takes to return the result of the join query is 46 seconds. Upon…
4
votes
1 answer

How does Apache Drill handle big result sets?

Let's say you have Drill connected to two separate databases, and you run a query where you would pull a massive amount of data from each and then do a join. How does Drill handle this without throwing Out of Memory errors? This is assuming that the…
Trant
  • 3,461
  • 6
  • 35
  • 57
4
votes
1 answer

How to write custom storage plugin for apache drill

I have my data in a propriety format, None of the ones supported by Apache drill. Are there any tutorial on how to write my own storage plugin to handle such data.
sushil
  • 165
  • 1
  • 9
4
votes
2 answers

Unnesting nested JSON structures in Apache Drill

I have the following JSON (roughly) and I'd like to extract the information from the header and defects fields separately: { "file": { "header": { "timeStamp": "2016-03-14T00:20:15.005+04:00", "serialNo": "3456", "sensorId":…
Ian
  • 1,294
  • 3
  • 17
  • 39
4
votes
2 answers

Apache Drill not using max RAM

I'm running apache drill 1.0(and then on 1.4) locally on a ubuntu machine that has 16GB of ram. When i work with a very large tab delimited file(52 Million rows, 7GB), and perform Select distinct columns[0] from `table.tsv` ,performance seems to…
user2773013
  • 3,102
  • 8
  • 38
  • 58
4
votes
2 answers

Empty patch created using "git format-patch origin/master --stdout"

I was looking into Drill's(open source github project) documentation to create a patch. I came across this command: git format-patch origin/master --stdout > DRILL-1234.1.patch.txt I made some changes. I verified my changes by git status. I…
Dev
  • 13,492
  • 19
  • 81
  • 174
1
2
3
42 43