0

I am trying to perform BulkLoad into Hbase. The input to map reduce is hdfs file(from Hive). Using the below code in Tool(Job) class to initiate the bulk loading process HFileOutputFormat.configureIncrementalLoad(job, new HTable(config, TABLE_NAME));

In Mapper, using the following as output of Mapper context.write(new ImmutableBytesWritable(Bytes.toBytes(hbaseTable)), put);

Once the mapper is completed. performing the actual bulk loading using,

LoadIncrementalHFiles loadFfiles = new LoadIncrementalHFiles(configuration);    
HTable hTable = new HTable(configuration, tableName);   
loadFfiles.doBulkLoad(new Path(pathToHFile), hTable);

The job runs fine, but once the Loadincrement start, it hangs on for ever. I have to stop the job from running after many attempts. However after long wait of may be 30 mins, I finally got the above error. After extensive search I found, that Hbase would be trying to access the files(HFiles) which are placed in the output folder, and that folder do not have permission to be written or executed. So throwing the above error. So the alternative solutions are to add file access permissions as below in java code before Bulk Loading is performed.

FileSystem fileSystem = FileSystem.get(config);
fileSystem.setPermission(new Path(outputPath),FsPermission.valueOf("drwxrwxrwx"));

Is this the correct approach, as we move from development to production. Also once I added the above code, I got the similar error for the folder created inside the output folder. This time its the column family folder. This is dynamic action at runtime.

As a temporary workaround, I did as below and was able to move ahead. fileSystem.setPermission(new Path(outputPath+"/col_fam_folder"),FsPermission.valueOf("drwxrwxrwx"));

Both the steps seems to be workarounds, and I need a correct solution to move to production. Thanks in advance

Ramzy
  • 6,948
  • 6
  • 18
  • 30
  • Do you have both `hadoop` and `hbase` users? Are you executing this code as `hbase` user instead of `hadoop` user? If you execute this code as hadoop user, you don't have to set permissions for output path. – Rajesh N Jun 05 '15 at 04:39
  • Hi, how to identify which user are you executing the job. I suppose its not hbase user with which I am running. – Ramzy Jun 05 '15 at 14:03
  • What were the ownership / permissions on the directory before you made it 777? – brandon.bell Jun 05 '15 at 19:33
  • They are created on the go, by map reduce job. However in error it shows as "drwxr-x---" – Ramzy Jun 05 '15 at 20:00

2 Answers2

0

Try this System.setProperty("HADOOP_USER_NAME", "hadoop");

-1

Secure bulk load seems to be an appropriate answer. This thread explains a sample implementation. The snippet is copied over as below.

import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HRegionInfo;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.client.coprocessor.SecureBulkLoadClient;
import org.apache.hadoop.hbase.security.UserProvider;
import org.apache.hadoop.hbase.security.token.FsDelegationToken;
import org.apache.hadoop.hbase.util.Pair;
import org.apache.hadoop.security.UserGroupInformation;

String keyTab = "pathtokeytabfile";
String tableName = "tb_name";
String pathToHFile = "/tmp/tmpfiles/";
Configuration configuration = new Configuration();  

configuration.set("hbase.zookeeper.quorum","ZK_QUORUM");
configuration.set("hbase.zookeeper"+ ".property.clientPort","2181");
configuration.set("hbase.master","MASTER:60000");
configuration.set("hadoop.security.authentication", "Kerberos");
configuration.set("hbase.security.authentication", "kerberos");


//Obtaining kerberos authentication 

UserGroupInformation.setConfiguration(configuration);

UserGroupInformation.loginUserFromKeytab("here keytab", path to the key tab);

HBaseAdmin.checkHBaseAvailable(configuration);

System.out.println("HBase is running!");

HBaseConfiguration.addHbaseResources(configuration);    

Connection conn = ConnectionFactory.createConnection(configuration);

Table table = conn.getTable(TableName.valueOf(tableName));

HRegionInfo tbInfo = new HRegionInfo(table.getName());


//path to the HFiles that need to be loaded 

Path hfofDir = new Path(pathToHFile);

//acquiring user token for authentication 

UserProvider up = UserProvider.instantiate(configuration);

FsDelegationToken fsDelegationToken = new FsDelegationToken(up, "name of the key tab user");

    fsDelegationToken.acquireDelegationToken(hfofDir.getFileSystem(configuration));

//preparing  for the bulk load

SecureBulkLoadClient secureBulkLoadClient = new SecureBulkLoadClient(table);

String bulkToken = secureBulkLoadClient.prepareBulkLoad(table.getName());

System.out.println(bulkToken);

//creating the family list (list of family names and path to the hfile corresponding to the family name)

final List<Pair<byte[], String>> famPaths = new ArrayList<>();

Pair p = new Pair();

//name of the family 
p.setFirst("nameofthefamily".getBytes());

//path to the HFile (HFile are organized in folder with the name of the family)
p.setSecond("/tmp/tmpfiles/INTRO/nameofthefilehere");

famPaths.add(p);

//bulk loading ,using the secure bulk load client

secureBulkLoadClient.bulkLoadHFiles(famPaths, fsDelegationToken.getUserToken(), bulkToken, tbInfo.getStartKey());

System.out.println("Bulk Load Completed..");    
vvo
  • 13
  • 3