I'm following the instruction written here: http://druid.io/docs/0.9.2/ingestion/batch-ingestion.html (scroll to "InputSpec specification", look for "granularity").
I have in in my indexing-task JSON:
"inputSpec": {
"type": "granularity",
"dataGranularity": "DAY",
"inputPath": "hdfs://hadoop:9000/druid/events/interview",
"filePattern": ".*",
"pathFormat": "'y'=yyyy/'m'=MM/'d'=dd"
}
I already have my files organized in HDFS like this (I did it on purpose, thinking that I would be using "granularity" type in my indexing task):
I keep getting this error (failure in indexing):
Caused by: java.io.IOException: No input paths specified in job
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) ~[?:?]
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340) ~[?:?]
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493) ~[?:?]
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510) ~[?:?]
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) ~[?:?]
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) ~[?:?]
Googled it, got two pages talking about the same problem:
- https://groups.google.com/forum/#!topic/druid-user/xKYgGv983ZQ
- https://groups.google.com/forum/#!topic/druid-user/B2YxNQ8UQR4
Both mentioned setting the value of "filePattern" to ".*". Did that, no luck.
To confirm tha my Druid-Hadoop link works, I tried changing my inputSpec to static:
"inputSpec": {
"type": "static",
"paths": "hdfs://hadoop:9000/druid/events/interview/y=2016/m=11/d=06/event.json,hdfs://hadoop:9000/druid/events/interview/y=2016/m=11/d=07/event.json"
}
It works. So, no problem with my Druid and Hadoop.
Is this "granularity" inputSpec broken in Druid (I'm using 0.9.2)? Because I don't see anything wrong in my inputSpec (the granularity type one); at least not according to the doc and forum I read.
In the meantime I can use the static one (and build my lengthy paths string), but that "granularity" type would be ideal (if only it worked).
Can anyone shed some light here?
Thanks.