I want to load a CSV (just comma separated) file into my Hbase table. I already tried it with help of some googled articles, now just I am able to load entire row (or line) as value into Hbase, i.e. all values in single row are getting stored as single column, but I want to split the row based on delimiter comma (,) and store those vales into different columns in Hbase table's column family.
Please help to solve my issue. Any suggestions are appreciated.
Following are my present using input file, agent configuration file and hbase output files.
1)input file
8600000US00601,00601,006015-DigitZCTA,0063-DigitZCTA,11102
8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869
8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423
8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548
8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603
2)agent configuration file
agent.sources = spool
agent.channels = fileChannel2
agent.sinks = sink2
agent.sources.spool.type = spooldir
agent.sources.spool.spoolDir = /home/cloudera/Desktop/flume
agent.sources.spool.fileSuffix = .completed
agent.sources.spool.channels = fileChannel2
#agent.sources.spool.deletePolicy = immediate
agent.sinks.sink2.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.sink2.channel = fileChannel2
agent.sinks.sink2.table = sample
agent.sinks.sink2.columnFamily = s1
agent.sinks.sink2.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.sink1.serializer.regex = "\"([^\"]+)\""
agent.sinks.sink2.serializer.regexIgnoreCase = true
agent.sinks.sink1.serializer.colNames =col1,col2,col3,col4,col5
agent.sinks.sink2.batchSize = 100
agent.channels.fileChannel2.type=memory
3)HBase output
hbase(main):009:0> scan 'sample'
ROW COLUMN+CELL
1431064328720-0LalKGmSf3-1 column=s1:payload, timestamp=1431064335428, value=8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869
1431064328720-0LalKGmSf3-2 column=s1:payload, timestamp=1431064335428, value=8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423
1431064328720-0LalKGmSf3-3 column=s1:payload, timestamp=1431064335428, value=8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548
1431064328721-0LalKGmSf3-4 column=s1:payload, timestamp=1431064335428, value=8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603
4 row(s) in 0.0570 seconds
hbase(main):010:0>