I am curious how a TSV file should look when we are ingesting data from a local TSV file using DRUID.
Should it just be like:
Please note this is just for testing:
quickstart/sample_data.tsv file:
name lastname email time Bob Jones bobj@gmail.com 1468839687 Billy Jones BillyJ@gmail.com 1468839769
Where this part is my dimensions: name lastname email
And this part is my actual data: Bob Jones bobj@gmail.com 1468839687 Billy Jones BillyJ@gmail.com 1468839769
{
"type" : "index_hadoop",
"spec" : {
"ioConfig" : {
"type" : "hadoop",
"inputSpec" : {
"type" : "static",
"paths" : "quickstart/sample_data.tsv"
}
},
"dataSchema" : {
"dataSource" : "local",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"intervals" : ["2016-07-18/2016-07-18"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "tsv",
"dimensionsSpec" : {
"dimensions" : [
"name",
"lastname",
"email"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "time"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"name" : "added",
"type" : "longSum",
"fieldName" : "deleted"
}
]
}
}
}
I had some questions about my spec file as well since I was not able to find the answers to them on the doc. I would appreciate it if someone can answer them for me :)!
1)
I noticed in the example spec they added the line "type" : "index_hadoop" at the very top. What would I put for the type if I am ingesting a TSV file from my local computer in the quickstart directory? Also where can I read about the different values I should put for this "type" key in the docs? I didn't get a explanation for that.
2)
Again there is a type variable in the ioConfig: "type" : "hadoop". What would I put for the type if I am ingesting a TSV file from my local computer in the quickstart directory?
3)
For the timestampSpec, the time in my TSV file is in GMT. Is there any way I can use this as the format. Since I read you should convert it to UTC and is there a way to convert to UTC during the process of posting the data to the Overlord? Or will I have to change all of those GMT time formats to UTC similar to this: "time":"2015-09-12T00:46:58.771Z".