Questions tagged [druid]

Druid is a column-oriented open-source distributed data store written in Java.

According to the Apache Druid website:

Apache Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. Most often, Druid powers use cases where real-time ingestion, fast query performance, and high uptime are important.

Druid is commonly used as the database backend for GUIs of analytical applications, or for highly-concurrent APIs that need fast aggregations. Druid works best with event-oriented data.

597 questions

votes

0 answers

How to hide/mask S3 credentials from druid logs

We have populated S3 credentials in the job.properties of druid ingestion script as mentioned below. "jobProperties" : { "fs.s3a.impl" : "org.apache.hadoop.fs.s3a.S3AFileSystem", "fs.AbstractFileSystem.s3a.impl" :…

druid

asked Aug 19 '20 at 09:33

Kiran

votes

2 answers

Apache Druid Native Query Explain Plan

I am trying to understand the performance of certain aspects of native queries in Apache Druid. Is there a way to get the execution plan for a native query? For SQL in Druid we have... EXPLAIN PLAN FOR SELECT * FORM ds is there an equivalent…

sql-execution-plan druid

asked Aug 10 '20 at 10:12

PhaKuDi

votes

1 answer

Apache Druid GroupBy Virtual columns

I am trying to do a groupby virtual column in a Druid native query which looks like this... { "queryType": "groupBy", "dataSource": "trace_info", "granularity": "none", "virtualColumns": [ { "type": "expression", "name":…

group-by druid virtual-column

asked Aug 06 '20 at 09:48

PhaKuDi

votes

0 answers

Kafka SSL Not streaming data to SSL Druid

I am new to druid and trying to do kafka(SSL) ingestion to SSL enabled druid. Druid is running on https. Kafka Version : 2.2.2 Druid Version : 0.18.1 Kafka SSL works and I can assure it using the producer and consumer scripts…

apache-kafka druid

asked Aug 04 '20 at 13:00

Amit Mundu

votes

1 answer

Sum(distinct metric) in apache druid

how do we write sum(distinct col) in druid ? if i try to write in druid, it says plans can't be build, but same is possible in Druid. I tried to convert to subquery approach, but my inner query returns lot of item level data, hence timing out.

druid

asked Jun 29 '20 at 23:12

kirantd

votes

1 answer

No module named 'pydruid'

I'm following this tutorial from Druid which is to connect jupyter notebook to druid. When i ran it keep giving me ModuleNotFoundError: No module named 'pydruid' when i already installed the requirement.

python jupyter-notebook druid pydruid

asked Jun 10 '20 at 04:23

Dzakirin

votes

1 answer

Can Superset visualize data returned from a REST API call?

We are trying to use Apache Superset to visualize business data, some of which is stored in SQL based databases, but some of it (think for example of external weather data) we need to access via public APIs (normally REST, but also sometimes push…

apache-superset druid

asked Mar 24 '20 at 06:08

Dunco

votes

1 answer

Why Druid segments become unavailable after data ingestion

Druid cluster shows unavailable for certain segments of data of data source after data ingestion. Ex: 72.4% available (2352 segments, 647 segments unavailable) We have a clustered deployment 3 nodes : master node (coordinator amd overlord) Data node…

segment druid

asked Feb 11 '20 at 11:46

Shashank NS

votes

2 answers

Cannot find class 'org.apache.hadoop.hive.druid.DruidStorageHandler'

The jar file for druid hive handler is there. Clients table is already there in hive with data. Filename in hive library folder hive-druid-handler-3.1.2.jar. I am getting the error an when I try to create table in hive for druid FAILED:…

linux hive druid data-ingestion hiveddl

asked Dec 17 '19 at 10:43

Vishnu

votes

1 answer

What is intermediate persist in Apache Druid?

How does Druid persist real time ingested data before it hands off to Deep storage? In the document, Druid has configuration about intermedatepersistperiod, and maxpendingpersists. But it doesn't say much about what is intermediate persist, how it…

druid data-ingestion

asked Sep 13 '19 at 11:45

Happy

votes

0 answers

Is there any way to connect MongoDB to Druid?

My organisation have MongoDB which stores application based time-series data. Now we are trying to create a data pipeline for analytics and visualisation. Due to time-series data we plan to use Druid as intermediate storage where we can do the…

mongodb apache-kafka druid

asked Aug 19 '19 at 11:14

hemant A

votes

1 answer

Druid query to get "latest" value from third column

I have a table in Druid, something like Timestamp || UserId || Action And I need to get the latest Action for each UserId. In MySQL I would do something like Select * from users u1 inner join ( select UserId, max(Timestamp) as maxt from users…

druid apache-calcite

asked Jul 10 '19 at 11:45

Matt

3,303
5
31
53

votes

0 answers

Druid Timeseries Row Count Aggregation

I am currently calculating the average for a single dimension in a Druid data source using a timeseries query via pydruid. This is based on an example in the documentation (https://github.com/druid-io/pydruid): from pydruid.client import…

time-series druid

asked May 23 '19 at 17:52

Huw

votes

2 answers

Can we change data type of dimension post ingestion in Druid

We are doing POC on Druid to check whether it fits our use cases. Though we are able to ingest data but not sure on following: How druid supports schemaless input: Let's say input dimension are on end user discretion. Then there is no defined…

druid

asked Jan 21 '19 at 07:06

KRS

votes

2 answers

How do I limit the size of log file generated by druid while using imply?

I'm using imply to handle druid's cluster. But my logs files have increased to hundreds of gigabytes of storage. I'm talking about logs files present in imply/var/sv/ directory in which there are these 7 log files, broker.log, historical.log,…

druid

asked Sep 30 '18 at 08:40

Point Networks

1,071
1
13
35

Prev 1 2

…

39 40 Next