Questions tagged [apache-drill]

Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data.It is capable of querying nested data in formats like JSON and Parquet and performing dynamic schema discovery.

Drill is an Apache open-source SQL query engine for Big Data exploration. Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data coming from modern Big Data applications, while still providing the familiarity and ecosystem of ANSI SQL, the industry-standard query language. Drill provides plug-and-play integration with existing Apache Hive and Apache HBase deployments.

Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores.

Recommended reference sources:

644 questions

votes

1 answer

Apache Drill: table not found on s3 bucket

I'm a newbye with Apache Drill. The scenario is this: I've an S3 bucket, where I place my csv file called test.csv. I've install Apache Drill with instructions from official website. I followed this tutorial:…

amazon-s3 apache-drill

asked Jul 24 '15 at 18:06

nicos

votes

2 answers

Can Apache Drill connect to Amazon RedShift?

Can Apache Drill connect to Amazon RedShift ? If yes Can anyone help me with configuration and plugin for Apache Drill to connect to Amazon RedShift .

amazon-redshift apache-drill

asked Apr 16 '15 at 18:00

alok tanna

votes

3 answers

WHERE filename in Apache Drill does a full scan in all files

select distinct filename from dfs.contoso.`folder/CSVs/` > 2021-01.csv > 2021-02.csv > ... or select count(*) as cnt from dfs.contoso.`folder/CSVs/` where filename = '2021-01.csv' > 4562751239 The problem is both of these queries take AN HOUR.…

apache-drill

asked Aug 19 '21 at 14:07

rudolfdobias

1,778
3
17
40

votes

1 answer

How to start Apache Drill in Docker Compose

This link explains how to run Apache Drill on Docker. docker run -i --name drill-1.18.0 -p 8047:8047 -t apache/drill:1.18.0 /bin/bash I need to run it on Docker Compose, so I set it up: version: "3.0" services: drill: image:…

docker docker-compose apache-drill

asked May 16 '21 at 15:23

ps0604

1,227
23
133
330

votes

0 answers

How to incrementally store timeseries in Parquet files for efficient retrieval?

I would like to store the stock price of a large number of companies in a parquet file in the form of a timeseries. If I gather the data at the end of 1 Jul, I would be writing a file such as: 1 Jul 2020, Company1,35 1 Jul 2020, Company2,46 …

parquet apache-drill

asked Jul 21 '20 at 07:43

Yash

votes

2 answers

Apache-Drill doesn't understand Pandas datetime64[ns]

I'm using Pyarrow, Pyarrow.Parquet as well as Pandas. When I send a Pandas datetime64[ns] series to a Parquet file and load it again via a drill query, the query shows an Integer like: 1467331200000000 which seems to be something else than a UNIX…

python parquet apache-drill pyarrow

asked Aug 05 '19 at 09:09

Christian

votes

1 answer

Can Apache Drill read Apache ORC file format?

Can Apache Drill read ORC files?

apache-drill orc

asked Mar 20 '18 at 11:02

Андрей Смирнов

votes

0 answers

Apache Drill JDBC connectivity through java code is giving error:Failure in connecting to Drill: oadd.org.apache.drill.exec.rpc.RpcException

i am trying drill-jdbc connectivity through java code. Error is:- java.sql.SQLException: Failure in connecting to Drill: oadd.org.apache.drill.exec.rpc.RpcException: CONNECTION : java.net.ConnectException: Connection refused: no further information:…

jdbc apache-drill

asked Mar 08 '18 at 08:39

sharda

votes

1 answer

slf4j-log4j12.jar and log4j-over-slf4j.jar in same path while dependency is getting resolved in Maven POM

I am trying to access drill using spark 2.1.0 . I have put below pom file in my project . But while compiling code I am finding below error . While I am removing drill dependency everything working fine . I understand spark already has…

apache-spark slf4j apache-drill log4j

asked Nov 21 '17 at 12:29

Priyaranjan Swain

votes

1 answer

Generating parquet files - differences between R and Python

We have generated a parquet file in Dask (Python) and with Drill (R using the Sergeant packet ). We have noticed a few issues: The format of the Dask (i.e. fastparquet) has a _metadata and a _common_metadata files while the parquet file in R \…

r parquet dask apache-drill fastparquet

asked Jul 31 '17 at 12:21

skibee

1,279
1
17
37

votes

1 answer

Apache Drill unusably slow with S3 data source?

I am trying to use Apache Drill with an S3 bucket, but it is incredibly slow. I have about 20,000 JSON files. I can get results from them locally in a few seconds, e.g.: > select count(*) from dfs.`/path/to/my/files/*.json`; returns after less…

amazon-web-services amazon-s3 apache-drill

asked Jul 04 '17 at 14:29

Richard

62,943
126
334
542

votes

1 answer

Apache Metamodel vs Apache Drill

Apache MetaModel is a data access framework that provides a common interface for the discovery, exploration, and querying of different types of data sources. Apache Drill is a schema-free SQL query engine that delivers real-time insights by removing…

apache-drill apache-metamodel

asked Feb 11 '17 at 17:46

Swappy

votes

0 answers

Issue Drill querying S3 directories recursively

I am trying to query a file under folder 't/atms-csv.csv' which I can successfully do it. Query file directly with filename: There is another file in that location which as additional data from another day (both file columnmodel). when I try query…

amazon-s3 apache-drill amazon-aurora

asked Jan 13 '17 at 06:24

user2412091

votes

2 answers

How to start drillbit locally in distributed mode?

I downloaded Apache Drill v1.8, edited the conf/drill-override.conf to have the following changes: drill.exec: { cluster-id: "drillbits1", zk.connect: "10.178.23.140:2181,10.178.23.140:2182,10.178.23.140:2183,10.178.23.140:2184" } ..zookeeper…

apache-drill

asked Nov 09 '16 at 11:45

Muhammad Gelbana

3,890
3
43
81

votes

0 answers

Apache Drill Query PostgreSQL Json

I am trying to query a jsonb field in PostgreSQL in drill and read it as if were coming from a json storage type but am running into trouble. I can conver from text to json but cannot seem to query the json object. At least I think I can convert to…

json postgresql apache-drill

asked May 25 '16 at 16:40

Andrew Scott Evans

1,003
12
26

Prev 1 2

…

42 43 Next