Questions tagged [druid]

Druid is a column-oriented open-source distributed data store written in Java.

According to the Apache Druid website:

Apache Druid is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. Most often, Druid powers use cases where real-time ingestion, fast query performance, and high uptime are important.

Druid is commonly used as the database backend for GUIs of analytical applications, or for highly-concurrent APIs that need fast aggregations. Druid works best with event-oriented data.

597 questions
3
votes
0 answers

How to hide/mask S3 credentials from druid logs

We have populated S3 credentials in the job.properties of druid ingestion script as mentioned below. "jobProperties" : { "fs.s3a.impl" : "org.apache.hadoop.fs.s3a.S3AFileSystem", "fs.AbstractFileSystem.s3a.impl" :…
Kiran
  • 451
  • 1
  • 6
  • 23
3
votes
2 answers

Apache Druid Native Query Explain Plan

I am trying to understand the performance of certain aspects of native queries in Apache Druid. Is there a way to get the execution plan for a native query? For SQL in Druid we have... EXPLAIN PLAN FOR SELECT * FORM ds is there an equivalent…
PhaKuDi
  • 141
  • 8
3
votes
1 answer

Apache Druid GroupBy Virtual columns

I am trying to do a groupby virtual column in a Druid native query which looks like this... { "queryType": "groupBy", "dataSource": "trace_info", "granularity": "none", "virtualColumns": [ { "type": "expression", "name":…
PhaKuDi
  • 141
  • 8
3
votes
0 answers

Kafka SSL Not streaming data to SSL Druid

I am new to druid and trying to do kafka(SSL) ingestion to SSL enabled druid. Druid is running on https. Kafka Version : 2.2.2 Druid Version : 0.18.1 Kafka SSL works and I can assure it using the producer and consumer scripts…
Amit Mundu
  • 31
  • 1
3
votes
1 answer

Sum(distinct metric) in apache druid

how do we write sum(distinct col) in druid ? if i try to write in druid, it says plans can't be build, but same is possible in Druid. I tried to convert to subquery approach, but my inner query returns lot of item level data, hence timing out.
kirantd
  • 47
  • 3
3
votes
1 answer

No module named 'pydruid'

I'm following this tutorial from Druid which is to connect jupyter notebook to druid. When i ran it keep giving me ModuleNotFoundError: No module named 'pydruid' when i already installed the requirement.
Dzakirin
  • 173
  • 17
3
votes
1 answer

Can Superset visualize data returned from a REST API call?

We are trying to use Apache Superset to visualize business data, some of which is stored in SQL based databases, but some of it (think for example of external weather data) we need to access via public APIs (normally REST, but also sometimes push…
Dunco
  • 305
  • 1
  • 4
  • 6
3
votes
1 answer

Why Druid segments become unavailable after data ingestion

Druid cluster shows unavailable for certain segments of data of data source after data ingestion. Ex: 72.4% available (2352 segments, 647 segments unavailable) We have a clustered deployment 3 nodes : master node (coordinator amd overlord) Data node…
3
votes
2 answers

Cannot find class 'org.apache.hadoop.hive.druid.DruidStorageHandler'

The jar file for druid hive handler is there. Clients table is already there in hive with data. Filename in hive library folder hive-druid-handler-3.1.2.jar. I am getting the error an when I try to create table in hive for druid FAILED:…
Vishnu
  • 93
  • 1
  • 5
3
votes
1 answer

What is intermediate persist in Apache Druid?

How does Druid persist real time ingested data before it hands off to Deep storage? In the document, Druid has configuration about intermedatepersistperiod, and maxpendingpersists. But it doesn't say much about what is intermediate persist, how it…
Happy
  • 121
  • 1
  • 8
3
votes
0 answers

Is there any way to connect MongoDB to Druid?

My organisation have MongoDB which stores application based time-series data. Now we are trying to create a data pipeline for analytics and visualisation. Due to time-series data we plan to use Druid as intermediate storage where we can do the…
hemant A
  • 185
  • 14
3
votes
1 answer

Druid query to get "latest" value from third column

I have a table in Druid, something like Timestamp || UserId || Action And I need to get the latest Action for each UserId. In MySQL I would do something like Select * from users u1 inner join ( select UserId, max(Timestamp) as maxt from users…
Matt
  • 3,303
  • 5
  • 31
  • 53
3
votes
0 answers

Druid Timeseries Row Count Aggregation

I am currently calculating the average for a single dimension in a Druid data source using a timeseries query via pydruid. This is based on an example in the documentation (https://github.com/druid-io/pydruid): from pydruid.client import…
Huw
  • 533
  • 1
  • 7
  • 15
3
votes
2 answers

Can we change data type of dimension post ingestion in Druid

We are doing POC on Druid to check whether it fits our use cases. Though we are able to ingest data but not sure on following: How druid supports schemaless input: Let's say input dimension are on end user discretion. Then there is no defined…
KRS
  • 132
  • 2
  • 12
3
votes
2 answers

How do I limit the size of log file generated by druid while using imply?

I'm using imply to handle druid's cluster. But my logs files have increased to hundreds of gigabytes of storage. I'm talking about logs files present in imply/var/sv/ directory in which there are these 7 log files, broker.log, historical.log,…
Point Networks
  • 1,071
  • 1
  • 13
  • 35
1 2
3
39 40