I am trying to perform BulkLoad into Hbase. The input to map reduce is hdfs file(from Hive). Using the below code in Tool(Job) class to initiate the bulk loading process HFileOutputFormat.configureIncrementalLoad(job, new HTable(config, TABLE_NAME));
In Mapper, using the following as output of Mapper context.write(new ImmutableBytesWritable(Bytes.toBytes(hbaseTable)), put);
Once the mapper is completed. performing the actual bulk loading using,
LoadIncrementalHFiles loadFfiles = new LoadIncrementalHFiles(configuration);
HTable hTable = new HTable(configuration, tableName);
loadFfiles.doBulkLoad(new Path(pathToHFile), hTable);
The job runs fine, but once the Loadincrement start, it hangs on for ever. I have to stop the job from running after many attempts. However after long wait of may be 30 mins, I finally got the above error. After extensive search I found, that Hbase would be trying to access the files(HFiles) which are placed in the output folder, and that folder do not have permission to be written or executed. So throwing the above error. So the alternative solutions are to add file access permissions as below in java code before Bulk Loading is performed.
FileSystem fileSystem = FileSystem.get(config);
fileSystem.setPermission(new Path(outputPath),FsPermission.valueOf("drwxrwxrwx"));
Is this the correct approach, as we move from development to production. Also once I added the above code, I got the similar error for the folder created inside the output folder. This time its the column family folder. This is dynamic action at runtime.
As a temporary workaround, I did as below and was able to move ahead. fileSystem.setPermission(new Path(outputPath+"/col_fam_folder"),FsPermission.valueOf("drwxrwxrwx"));
Both the steps seems to be workarounds, and I need a correct solution to move to production. Thanks in advance