I wrote a Spark application that generates HFiles to be used for bulk loading with the LoadIncrementalHFiles
command later. As the source data pool is very big, the input files are splitted into iterations that are processed one after the other. Each iteration creates its own HFile
directory, so my HDFS structure looks like this:
/user/myuser/map_data/hfiles_0
... /hfiles_1
... /hfiles_2
... /hfiles_3
...
There are about 500 files in this map_data
directory, therefore I'm searching for a way to automatically call the LoadIncrementalHFiles
function, to process these subdirectories also in iterations later.
The corresponding command would be this:
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles -Dcreate.table=no /user/myuser/map_data/hfiles_0 mytable
I need to change this into an iterative command, as this command does not work with subdirectories (when I call it with the /user/myuser/map_data
directory)!
I tried to use a Java Process
instance to execute the command above automatically, but this doesn't seen to do anything (no output to console and also no more rows in my HBase table).
Using the org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
Java class out of my code also doesn't work, it's also not responsing!
Has anybody a working example for me? Or is there a parameter to be able to run the above hbase
command on the parent directory? I'm working with HBase 1.1.2 in a Hortonworks Data Platform 2.5 cluster.
EDIT I tried to run the LoadIncrementalHFiles
command from a Hadoop client Java application, but I'm getting an exception relating to snappy compression, see Run LoadIncrementalHFiles from Java client