Questions tagged [apache-drill]

Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data.It is capable of querying nested data in formats like JSON and Parquet and performing dynamic schema discovery.

Drill is an Apache open-source SQL query engine for Big Data exploration. Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data coming from modern Big Data applications, while still providing the familiarity and ecosystem of ANSI SQL, the industry-standard query language. Drill provides plug-and-play integration with existing Apache Hive and Apache HBase deployments.

Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores.

Recommended reference sources:

644 questions
4
votes
1 answer

Apache Drill: table not found on s3 bucket

I'm a newbye with Apache Drill. The scenario is this: I've an S3 bucket, where I place my csv file called test.csv. I've install Apache Drill with instructions from official website. I followed this tutorial:…
nicos
  • 113
  • 7
4
votes
2 answers

Can Apache Drill connect to Amazon RedShift?

Can Apache Drill connect to Amazon RedShift ? If yes Can anyone help me with configuration and plugin for Apache Drill to connect to Amazon RedShift .
alok tanna
  • 71
  • 6
3
votes
3 answers

WHERE filename in Apache Drill does a full scan in all files

select distinct filename from dfs.contoso.`folder/CSVs/` > 2021-01.csv > 2021-02.csv > ... or select count(*) as cnt from dfs.contoso.`folder/CSVs/` where filename = '2021-01.csv' > 4562751239 The problem is both of these queries take AN HOUR.…
rudolfdobias
  • 1,778
  • 3
  • 17
  • 40
3
votes
1 answer

How to start Apache Drill in Docker Compose

This link explains how to run Apache Drill on Docker. docker run -i --name drill-1.18.0 -p 8047:8047 -t apache/drill:1.18.0 /bin/bash I need to run it on Docker Compose, so I set it up: version: "3.0" services: drill: image:…
ps0604
  • 1,227
  • 23
  • 133
  • 330
3
votes
0 answers

How to incrementally store timeseries in Parquet files for efficient retrieval?

I would like to store the stock price of a large number of companies in a parquet file in the form of a timeseries. If I gather the data at the end of 1 Jul, I would be writing a file such as: 1 Jul 2020, Company1,35 1 Jul 2020, Company2,46 …
Yash
  • 946
  • 1
  • 13
  • 28
3
votes
2 answers

Apache-Drill doesn't understand Pandas datetime64[ns]

I'm using Pyarrow, Pyarrow.Parquet as well as Pandas. When I send a Pandas datetime64[ns] series to a Parquet file and load it again via a drill query, the query shows an Integer like: 1467331200000000 which seems to be something else than a UNIX…
Christian
  • 515
  • 1
  • 6
  • 17
3
votes
1 answer

Can Apache Drill read Apache ORC file format?

Can Apache Drill read ORC files?
3
votes
0 answers

Apache Drill JDBC connectivity through java code is giving error:Failure in connecting to Drill: oadd.org.apache.drill.exec.rpc.RpcException

i am trying drill-jdbc connectivity through java code. Error is:- java.sql.SQLException: Failure in connecting to Drill: oadd.org.apache.drill.exec.rpc.RpcException: CONNECTION : java.net.ConnectException: Connection refused: no further information:…
sharda
  • 31
  • 3
3
votes
1 answer

slf4j-log4j12.jar and log4j-over-slf4j.jar in same path while dependency is getting resolved in Maven POM

I am trying to access drill using spark 2.1.0 . I have put below pom file in my project . But while compiling code I am finding below error . While I am removing drill dependency everything working fine . I understand spark already has…
3
votes
1 answer

Generating parquet files - differences between R and Python

We have generated a parquet file in Dask (Python) and with Drill (R using the Sergeant packet ). We have noticed a few issues: The format of the Dask (i.e. fastparquet) has a _metadata and a _common_metadata files while the parquet file in R \…
skibee
  • 1,279
  • 1
  • 17
  • 37
3
votes
1 answer

Apache Drill unusably slow with S3 data source?

I am trying to use Apache Drill with an S3 bucket, but it is incredibly slow. I have about 20,000 JSON files. I can get results from them locally in a few seconds, e.g.: > select count(*) from dfs.`/path/to/my/files/*.json`; returns after less…
Richard
  • 62,943
  • 126
  • 334
  • 542
3
votes
1 answer

Apache Metamodel vs Apache Drill

Apache MetaModel is a data access framework that provides a common interface for the discovery, exploration, and querying of different types of data sources. Apache Drill is a schema-free SQL query engine that delivers real-time insights by removing…
Swappy
  • 63
  • 2
  • 7
3
votes
0 answers

Issue Drill querying S3 directories recursively

I am trying to query a file under folder 't/atms-csv.csv' which I can successfully do it. Query file directly with filename: There is another file in that location which as additional data from another day (both file columnmodel). when I try query…
3
votes
2 answers

How to start drillbit locally in distributed mode?

I downloaded Apache Drill v1.8, edited the conf/drill-override.conf to have the following changes: drill.exec: { cluster-id: "drillbits1", zk.connect: "10.178.23.140:2181,10.178.23.140:2182,10.178.23.140:2183,10.178.23.140:2184" } ..zookeeper…
Muhammad Gelbana
  • 3,890
  • 3
  • 43
  • 81
3
votes
0 answers

Apache Drill Query PostgreSQL Json

I am trying to query a jsonb field in PostgreSQL in drill and read it as if were coming from a json storage type but am running into trouble. I can conver from text to json but cannot seem to query the json object. At least I think I can convert to…
Andrew Scott Evans
  • 1,003
  • 12
  • 26
1 2
3
42 43