1

I'm following the instruction written here: http://druid.io/docs/0.9.2/ingestion/batch-ingestion.html (scroll to "InputSpec specification", look for "granularity").

I have in in my indexing-task JSON:

"inputSpec": {
  "type": "granularity",
  "dataGranularity": "DAY",
  "inputPath": "hdfs://hadoop:9000/druid/events/interview",
  "filePattern": ".*",
  "pathFormat": "'y'=yyyy/'m'=MM/'d'=dd"
} 

I already have my files organized in HDFS like this (I did it on purpose, thinking that I would be using "granularity" type in my indexing task):

enter image description here

I keep getting this error (failure in indexing):

Caused by: java.io.IOException: No input paths specified in job
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) ~[?:?]
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340) ~[?:?]
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493) ~[?:?]
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510) ~[?:?]
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) ~[?:?]
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) ~[?:?] 

Googled it, got two pages talking about the same problem:

Both mentioned setting the value of "filePattern" to ".*". Did that, no luck.

To confirm tha my Druid-Hadoop link works, I tried changing my inputSpec to static:

"inputSpec": {
  "type": "static",
  "paths": "hdfs://hadoop:9000/druid/events/interview/y=2016/m=11/d=06/event.json,hdfs://hadoop:9000/druid/events/interview/y=2016/m=11/d=07/event.json"
}

It works. So, no problem with my Druid and Hadoop.

Is this "granularity" inputSpec broken in Druid (I'm using 0.9.2)? Because I don't see anything wrong in my inputSpec (the granularity type one); at least not according to the doc and forum I read.

In the meantime I can use the static one (and build my lengthy paths string), but that "granularity" type would be ideal (if only it worked).

Can anyone shed some light here?

Thanks.

Cokorda Raka
  • 4,375
  • 6
  • 36
  • 54

1 Answers1

0

Try adding a / to the end of the path pattern: "pathFormat": "'y'=yyyy/'m'=MM/'d'=dd/"

giwyni
  • 2,928
  • 1
  • 19
  • 11