1

I am currently using the druid-Incubating-0.16.0 version. As mentioned in https://druid.apache.org/docs/latest/tutorials/tutorial-update-data.html tutorial link, we can use combining firehose to update and merge the data for a data source.

Step: 1 I am using the same sample data with the initial structure as

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger    │     1 │    100 │
│ 2018-01-01T03:01:00.000Z │ aardvark │     1 │     42 │
│ 2018-01-01T03:01:00.000Z │ giraffe  │     1 │  14124 │
└──────────────────────────┴──────────┴───────┴────────┘

Step 2: I updated the data for tiger with {"timestamp":"2018-01-01T01:01:35Z","animal":"tiger", "number":30} with appendToExisting = false and rollUp = true and found the result

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger    │     2 │    130 │
│ 2018-01-01T03:01:00.000Z │ aardvark │     1 │     42 │
│ 2018-01-01T03:01:00.000Z │ giraffe  │     1 │  14124 │
└──────────────────────────┴──────────┴───────┴────────┘

Step 3: Now i am updating giraffe with {"timestamp":"2018-01-01T03:01:35Z","animal":"giraffe", "number":30} with appendToExisting = false and rollUp = true and getting the following result

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger    │     1 │    130 │
│ 2018-01-01T03:01:00.000Z │ aardvark │     1 │     42 │
│ 2018-01-01T03:01:00.000Z │ giraffe  │     2 │  14154 │
└──────────────────────────┴──────────┴───────┴────────┘

My doubt is, In step 3 the count of the tiger is getting decreased by 1 but I think it should not be changed since there are no changes in step 3 for tiger and there is no number change also

FYI, count and number are metricSpec and they are count and longSum respectively. Please clarify.


when using ingestSegment firehose with initial data like

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T00:00:00.000Z │ aardvark │     1 │   9999 │
│ 2018-01-01T00:00:00.000Z │ bear     │     1 │    111 │
│ 2018-01-01T00:00:00.000Z │ lion     │     2 │    200 │
└──────────────────────────┴──────────┴───────┴────────┘

while adding a new data {"timestamp":"2018-01-01T03:01:35Z","animal":"giraffe", "number":30} with appendToExisting = true, i am getting

┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T00:00:00.000Z │ aardvark │     1 │   9999 │
│ 2018-01-01T00:00:00.000Z │ bear     │     1 │    111 │
│ 2018-01-01T00:00:00.000Z │ lion     │     2 │    200 │
│ 2018-01-01T00:00:00.000Z │ aardvark │     1 │   9999 │
│ 2018-01-01T00:00:00.000Z │ bear     │     1 │    111 │
│ 2018-01-01T00:00:00.000Z │ giraffe  │     1 │     30 │
│ 2018-01-01T00:00:00.000Z │ lion     │     1 │    200 │
└──────────────────────────┴──────────┴───────┴────────┘

is it correct and expected output? why the rollup didn't happen?

theNextBigThing
  • 131
  • 3
  • 14

1 Answers1

4

Druid has actually only 2 modes. Overwrite or append.

With the appendToExisting=true, your data will be appended to the existing data, which will cause that the "number" field will increase (and the count also).

With appendToExisting=false all your data in the segment is overwritten. I think this is what happening.

This is different then with "normal" databases, where you can update specific rows.

In druid you can update only certain rows, but this is done by re-indexing your data. It is not a very easy process. This re-indexing is done by an ingestSegment Firehose, which reads your data from a segment, and then writes it also to a segment (can be the same). During this process, you can add a transform filter, which does a specific action, like update certain field values.

We have build a PHP library to make these processes more easy to work with. See this example how to re-index a segment and apply a transformation during the re-indexing.

https://github.com/level23/druid-client#reindex

58k723f1
  • 579
  • 2
  • 15
  • please check the update description with ingestSegment firehose. Do i have to write the transformationSpec as you suggested ? – theNextBigThing Dec 03 '19 at 05:29
  • "firehose" : { "type": "combining", "delegates": [ { "type" : "ingestSegment", "dataSource" : "updates-tutorial", "interval" : "2018-01-01/2018-01-03" }, { "type" : "local", "baseDir" : "quickstart/tutorial", "filter" : "updates-data3.json" } ] } – theNextBigThing Dec 03 '19 at 05:38
  • The rollup did not happen because you specified the `appendToExisting=true`. The rollup will be applied to the data which is ingested. These will then be appended to your data, which is exactly what you see in the resultset. So if you add the "giraffe" data to your original dataset, you should set `appendToExisting` to `false`, so that the complete dataset is overwritten with your new data. You can also **only** push the "giraffe" record to your dataset with `appendToExisting=true`. In this case, the "giraffe" record will be added. – 58k723f1 Dec 03 '19 at 09:26
  • Yes. As per my understanding, keeping appendToExisting = true will always append the data to the existing. Rollup will only happen for the dataset which is about to ingest. – theNextBigThing Dec 03 '19 at 11:26
  • Correct. So if you add these: `{"timestamp":"2018-01-01T03:01:10Z","animal":"giraffe", "number":30} {"timestamp":"2018-01-01T03:01:20Z","animal":"giraffe", "number":30}` and use the `rollUp=true`, your data will be merged as 1 record with `number: 60` – 58k723f1 Dec 03 '19 at 16:28
  • Yes. I observed the same. – theNextBigThing Dec 04 '19 at 04:15