24

I have the following rows with these keys in hbase table "mytable"

user_1
user_2
user_3
...
user_9999999

I want to use the Hbase shell to delete rows from:

user_500 to user_900

I know there is no way to delete, but is there a way I could use the "BulkDeleteProcessor" to do this?

I see here:

https://github.com/apache/hbase/blob/master/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java

I want to just paste in imports and then paste this into the shell, but have no idea how to go about this. Does anyone know how I can use this endpoint from the jruby hbase shell?

   Table ht = TEST_UTIL.getConnection().getTable("my_table");
    long noOfDeletedRows = 0L;
    Batch.Call<BulkDeleteService, BulkDeleteResponse> callable =
      new Batch.Call<BulkDeleteService, BulkDeleteResponse>() {
      ServerRpcController controller = new ServerRpcController();
      BlockingRpcCallback<BulkDeleteResponse> rpcCallback =
        new BlockingRpcCallback<BulkDeleteResponse>();

      public BulkDeleteResponse call(BulkDeleteService service) throws IOException {
        Builder builder = BulkDeleteRequest.newBuilder();
        builder.setScan(ProtobufUtil.toScan(scan));
        builder.setDeleteType(deleteType);
        builder.setRowBatchSize(rowBatchSize);
        if (timeStamp != null) {
          builder.setTimestamp(timeStamp);
        }
        service.delete(controller, builder.build(), rpcCallback);
        return rpcCallback.get();
      }
    };
    Map<byte[], BulkDeleteResponse> result = ht.coprocessorService(BulkDeleteService.class, scan
        .getStartRow(), scan.getStopRow(), callable);
    for (BulkDeleteResponse response : result.values()) {
      noOfDeletedRows += response.getRowsDeleted();
    }
    ht.close();

If there exists no way to do this through JRuby, Java or alternate way to quickly delete multiple rows is fine.

Chris Martin
  • 30,334
  • 10
  • 78
  • 137
Rolando
  • 58,640
  • 98
  • 266
  • 407

2 Answers2

17

Do you really want to do it in shell because there are various other better ways. One way is using the native java API

  • Construct an array list of deletes
  • pass this array list to Table.delete method

Method 1: if you already know the range of keys.

public void massDelete(byte[] tableName) throws IOException {
    HTable table=(HTable)hbasePool.getTable(tableName);

    String tablePrefix = "user_";
    int startRange = 500;
    int endRange = 999;

    List<Delete> listOfBatchDelete = new ArrayList<Delete>();

    for(int i=startRange;i<=endRange;i++){
        String key = tablePrefix+i; 
        Delete d=new Delete(Bytes.toBytes(key));
        listOfBatchDelete.add(d);  
    }

    try {
        table.delete(listOfBatchDelete);
    } finally {
        if (hbasePool != null && table != null) {
            hbasePool.putTable(table);
        }
    }
}

Method 2: If you want to do a batch delete on the basis of a scan result.

public bulkDelete(final HTable table) throws IOException {
    Scan s=new Scan();
    List<Delete> listOfBatchDelete = new ArrayList<Delete>();
    //add your filters to the scanner
    s.addFilter();
    ResultScanner scanner=table.getScanner(s);
    for (Result rr : scanner) {
        Delete d=new Delete(rr.getRow());
        listOfBatchDelete.add(d);
    }
    try {
        table.delete(listOfBatchDelete);
    } catch (Exception e) {
        LOGGER.log(e);

    }
}

Now coming down to using a CoProcessor. only one advice, 'DON'T USE CoProcessor' unless you are an expert in HBase. CoProcessors have many inbuilt issues if you need I can provide a detailed description to you. Secondly when you delete anything from HBase it's never directly deleted from Hbase there is tombstone marker get attached to that record and later during a major compaction it gets deleted, so no need to use a coprocessor which is highly resource exhaustive.

Modified code to support batch operation.

int batchSize = 50;
int batchCounter=0;
for(int i=startRange;i<=endRange;i++){

String key = tablePrefix+i;
Delete d=new Delete(Bytes.toBytes(key));
listOfBatchDelete.add(d);  
batchCounter++;

if(batchCounter==batchSize){
    try {
        table.delete(listOfBatchDelete);
        listOfBatchDelete.clear();
        batchCounter=0;
    }
}}

Creating HBase conf and getting table instance.

Configuration hConf = HBaseConfiguration.create(conf);
hConf.set("hbase.zookeeper.quorum", "Zookeeper IP");
hConf.set("hbase.zookeeper.property.clientPort", ZookeeperPort);

HTable hTable = new HTable(hConf, tableName);
britter
  • 1,352
  • 1
  • 11
  • 26
  • For both solutions if you have very very big number of rows to delete, you should consider the heap size since we may have many Delete objects. (maybe executing the deletes in batches) – Afshin Moazami Sep 28 '15 at 14:25
  • Yes definitely that can be done very easily just have one more for loop to create a batch and fire delete. – Vikram Singh Chandel Sep 28 '15 at 16:42
  • Yes, that can be done very easily with the help of one more for loop to create batches of required size. If you are suggesting using HBase batch(List) method call then it would be a little faster but it wont help in minimizing or over coming heap usage. To do so you have to create one more for loop – Vikram Singh Chandel Sep 28 '15 at 16:46
  • Silly question, but is this NOT a mapreduce job? Can I just paste that code into eclipse .java file, assuming I have the right dependencies/filters/connection info, and just run it? – Rolando Sep 29 '15 at 05:04
  • No it's not a MR job, it's just plain simple java HBase API code. To run this code just create a Maven(for simplicity) project and add dependencies as per your HBase version. Then create Hbase configuration to get the table instance. To create configuration and get the table instance there are many ways see the edited code last section for the simplest one. – Vikram Singh Chandel Sep 29 '15 at 07:20
4

If you already aware of the rowkeys of the records that you want to delete from HBase table then you can use the following approach

1.First create a List objects with these rowkeys

for (int rowKey = 1; rowKey <= 10; rowKey++) {
    deleteList.add(new Delete(Bytes.toBytes(rowKey + "")));
}

2.Then get the Table object by using HBase Connection

Table table = connection.getTable(TableName.valueOf(tableName));

3.Once you have table object call delete() by passing the list

table.delete(deleteList);

The complete code will look like below

Configuration config = HBaseConfiguration.create();
config.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
config.addResource(new Path("/etc/hadoop/conf/core-site.xml"));

String tableName = "users";

Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf(tableName));

List<Delete> deleteList = new ArrayList<Delete>();

for (int rowKey = 500; rowKey <= 900; rowKey++) {
    deleteList.add(new Delete(Bytes.toBytes("user_" + rowKey)));
}

table.delete(deleteList);
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59