2

I want to load a CSV (just comma separated) file into my Hbase table. I already tried it with help of some googled articles, now just I am able to load entire row (or line) as value into Hbase, i.e. all values in single row are getting stored as single column, but I want to split the row based on delimiter comma (,) and store those vales into different columns in Hbase table's column family.

Please help to solve my issue. Any suggestions are appreciated.

Following are my present using input file, agent configuration file and hbase output files.

1)input file

8600000US00601,00601,006015-DigitZCTA,0063-DigitZCTA,11102
8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869
8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423
8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548
8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603

2)agent configuration file

agent.sources  = spool
agent.channels = fileChannel2
agent.sinks    = sink2

agent.sources.spool.type = spooldir
agent.sources.spool.spoolDir = /home/cloudera/Desktop/flume
agent.sources.spool.fileSuffix = .completed
agent.sources.spool.channels = fileChannel2
#agent.sources.spool.deletePolicy = immediate

agent.sinks.sink2.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.sink2.channel = fileChannel2
agent.sinks.sink2.table = sample
agent.sinks.sink2.columnFamily = s1
agent.sinks.sink2.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.sink1.serializer.regex = "\"([^\"]+)\""
agent.sinks.sink2.serializer.regexIgnoreCase = true
agent.sinks.sink1.serializer.colNames =col1,col2,col3,col4,col5
agent.sinks.sink2.batchSize = 100
agent.channels.fileChannel2.type=memory

3)HBase output 

hbase(main):009:0> scan 'sample'
ROW                                         COLUMN+CELL                                                                                                                 
 1431064328720-0LalKGmSf3-1                 column=s1:payload, timestamp=1431064335428, value=8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869                
 1431064328720-0LalKGmSf3-2                 column=s1:payload, timestamp=1431064335428, value=8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423                
 1431064328720-0LalKGmSf3-3                 column=s1:payload, timestamp=1431064335428, value=8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548                
 1431064328721-0LalKGmSf3-4                 column=s1:payload, timestamp=1431064335428, value=8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603                
4 row(s) in 0.0570 seconds

hbase(main):010:0> 
frb
  • 3,738
  • 2
  • 21
  • 51
prasad
  • 339
  • 8
  • 23

0 Answers0