I am using CDH4.4. I have an app currently running which serializes records into a single column in hbase via avro. I am in the process of moving my current solr index of this table into solrcloud, so I'm testing the MapReduceIndexerTool to do bulk indexing of the whole table. I have a very simple morphlines file which currently uses "extractHBaseCells" to read records from HBase.
I set this up a tracer proof-of-concept, only indexing the rowkey => id and stuffing the avro blob into another field, just to verify that I could get data from HBase over to my collection in SolrCloud, and that works. But I'd like to parse the avro and stick those values into their own fields on the solrdocuments before submitting them to solrcloud. But it would seem that the nature of "extractHBaseCells" prevents this. If there were an hbase reader command that emitted more general output that could then flow into the avro commands in morphlines, I am confident I could solve my own problem.
Are there any known workarounds for parsing avro that has been stored in HBase or possibly some more morphlines commands that could address this?