I need to import everyday a file containing the yesterday's snapshot of a database. To import I use the following command in the shell:
./bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
'-Dimporttsv.separator=|' \
-Dimporttsv.columns=HBASE_ROW_KEY,info:date,info:author,info:text \
tableName \
inputFile.tsv
The problem is that each line contains all the values and not just the updated ones, resulting to have several versions for each column but with the same value.
There is any other way to import this daily snapshot ignoring the duplicate values? Or any suggestion to workaround this?
Thank you!