2

I am evaluating Druid for my use case which ingest csv data through tranquility in real time. Following is the server configuration:-

{
  "dataSources" : {
    "audience" : {
      "spec" : {
        "dataSchema" : {
          "dataSource" : "audience",
          "parser" : {
            "type" : "string",
          "parseSpec":{
    "format" : "csv",
    "timestampSpec" : {
      "column" : "timestamp"
    },
    "columns" : ["timestamp","partner_id","event_id","product_id","device_id","count"],
    "dimensionsSpec" : {
      "dimensions" : ["partner_id","event_id","product_id","device_id"]
    }
  }
},
          "metricsSpec" : [{ "type" : "longSum", "name" : total, "fieldName" : "count" }],
  "granularitySpec" : {
    "segmentGranularity" : "HOUR",
    "queryGranularity" : "HOUR",
    "intervals" : [ "2013-08-31/2013-09-01" ]
  }

  },
        "ioConfig" : {
          "type" : "realtime"
        },
        "tuningConfig" : {
          "type" : "realtime",
          "maxRowsInMemory" : "100000",
          "intermediatePersistPeriod" : "PT10M",
          "windowPeriod" : "PT10M"
        }
      },
      "properties" : {
        "task.partitions" : "1",
        "task.replicants" : "1"
      }
    }
  },
  "properties" : {
    "zookeeper.connect" : "localhost",
    "druid.discovery.curator.path" : "/druid/discovery",
    "druid.selectors.indexing.serviceName" : "druid/overlord",
    "http.port" : "8200",
    "http.threads" : "8"
  }
}

data is generated randomly by a python script as:-

1471336991,1,960,136,3ZLA7,1
1471336991,1,369,367,8MP2B,1
1471336991,2,544,550,C9ZG8,1
1471336991,1,135,394,XFX31,1
1471336991,2,590,552,VXMTL,1
1471336991,1,493,615,0C2HR,1
1471336991,2,435,710,HKYP0,1
1471336991,1,394,483,V2HP9,1
1471336991,2,441,376,J1LYO,1

Following commands submits the data and returns {"result":{"received":1000,"sent":0}}

python createData.py |curl -XPOST -H'Content-Type: text/plain' --data-binary @- http://localhost:8200/v1/post/audience.
Mangat Rai Modi
  • 5,397
  • 8
  • 45
  • 75

3 Answers3

2

Finally able to solve the problem. Actually I was sending time to Druid in Epoch time format, but it expect ISO-8601 format. In python one can easily get so by :-

datetime.datetime.utcnow().isoformat()
Mangat Rai Modi
  • 5,397
  • 8
  • 45
  • 75
  • This is a irrevalent question to your post! But do you happen to know the difference between `windowPeriod` and `segmentGranularity` in the real time ingestion spec above? Like does it create a segment based on the value you provide in the `windowPeriod` or the `segmentGranularity`? –  Aug 24 '16 at 17:23
  • Yup! Window period is about relevance of the data and segment granularity tells when to rollup the data. – Mangat Rai Modi Aug 25 '16 at 06:52
  • @MangatRaiModi Do you mind looking at the question here? https://stackoverflow.com/questions/45206900/tranquility-server-would-not-send-data-to-druid I try to send data to a server with window period of one year. Tranquility receives the data but would not send it. Thanks! – Haonan Chen Jul 20 '17 at 06:33
1

Druid supports multiple time formats which can be specified in the "timestampSpec" property. The Druid documentation lists the following timestamp formats: "iso, millis, posix, auto or any Joda time format."

For example, to send time in milliseconds:

"timestampSpec" : {
    "column" : "timestamp",
    "format" : "millis"
 }
Andy-Delosdos
  • 3,560
  • 1
  • 12
  • 25
rohit kochar
  • 106
  • 2
0

A couple of things

  1. Use ISO 8601 Datetime format
  2. Make sure the written timestamp is within +/- 10Mins of the current hour
karthik r
  • 989
  • 13
  • 11