7

I have some HBase tables with millions of rows but only a few columns. I want to extract the column names of each table and store it in a separate file. What is the best way to do this? Thanks.

Community
  • 1
  • 1
Anuranjan
  • 134
  • 1
  • 1
  • 8
  • check this one http://stackoverflow.com/questions/7724855/how-can-i-dump-hbase-table-in-a-text-file – Nirmal Ram Nov 09 '16 at 13:25
  • See http://stackoverflow.com/questions/33225858/can-we-get-all-the-column-names-from-an-hbase-table – Serg M Ten Nov 09 '16 at 14:00
  • Please check my answer to do it in a generic way through java api,which should work for you if you are dealing with multiple tables... – Ram Ghadiyaram Nov 12 '16 at 09:53
  • Possible duplicate of [Can we get all the column names from an HBase table?](https://stackoverflow.com/questions/33225858/can-we-get-all-the-column-names-from-an-hbase-table) – WattsInABox Dec 18 '17 at 18:47

3 Answers3

8

This should save column names in Hbase_table_columns.txt file on local (not on hdfs):

echo "scan 'table_name'" | $HBASE_HOME/bin/hbase shell | awk -F'=' '{print $2}' | awk -F ':' '{print $1}' > Hbase_table_columns.txt

This should save column names on console:

echo "scan 'table_name'" | $HBASE_HOME/bin/hbase shell | awk -F'=' '{print $2}' | awk -F ':' '{print $1}'

This should save column names in Hbase_table_columns.txt file and also print on console:

echo "scan 'table_name'" | $HBASE_HOME/bin/hbase shell | awk -F'=' '{print $2}' | awk -F ':' '{print $1}' |tee Hbase_table_columns.txt

This should save/print column family:column name:

echo "scan 'table_name'" | $HBASE_HOME/bin/hbase shell | awk -F'=' '{print $2}'|tee Hbase_table_columns.txt
Ronak Patel
  • 3,819
  • 1
  • 16
  • 29
3

I'd offer java Hbase client API which was exposed by HbaseAdmin class like below...

Client would be like

package mytest;
import com.usertest.*;

import java.io.IOException;
import java.util.Date;
import java.util.HashSet;
import java.util.List;
import java.util.Set;


public class ListHbaseTablesAndColumns {
    public static void main(String[] args) {
        try {
            HbaseMetaData hbaseMetaData  =new HbaseMetaData();
            for(String hbaseTable:hbaseMetaData  .getTableNames(".*yourtables.*")){
                    for (String column : hbaseMetaData  .getColumns(hbaseTable, 10000)) {
                        System.out.println(hbaseTable + "," + column);
                    }
                
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Use below class to Get HbaseMetaData..

package com.usertest;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.PageFilter;

import java.io.IOException;
import java.util.*;
import java.util.regex.Pattern;

public class HbaseMetaData {
    private HBaseAdmin hBaseAdmin;
    private Configuration hBaseConfiguration;

    public HbaseMetaData () throws IOException {
        this.hBaseConfiguration = HBaseConfiguration.create();
        this.hBaseAdmin = new HBaseAdmin(hBaseConfiguration);
    }
/** get all Table names **/
    public List<String> getTableNames(String regex) throws IOException {
        Pattern pattern=Pattern.compile(regex);
        List<String> tableList = new ArrayList<String>();
        TableName[] tableNames=hBaseAdmin.listTableNames();
        for (TableName tableName:tableNames){
            if(pattern.matcher(tableName.toString()).find()){
                tableList.add(tableName.toString());
            }
        }
        return tableList;
    }
/** Get all columns **/
    public Set<String> getColumns(String hbaseTable) throws IOException {
        return getColumns(hbaseTable, 10000);
    }
/** get all columns from the table **/
    public Set<String> getColumns(String hbaseTable, int limitScan) throws IOException {
        Set<String> columnList = new TreeSet<String>();
        HTable hTable=new HTable(hBaseConfiguration, hbaseTable);
        Scan scan=new Scan();
        scan.setFilter(new PageFilter(limitScan));
        ResultScanner results = hTable.getScanner(scan);
        for(Result result:results){
            for(KeyValue keyValue:result.list()){
                columnList.add(
                        new String(keyValue.getFamily()) + ":" +
                                new String(keyValue.getQualifier())
                );
            }
        }
        return columnList;
    }
}
Community
  • 1
  • 1
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
1

Below will help in getting columns with respect to specific key

scan 'namespace:tablename',{FILTER=>'KeyOnlyFilter()'}

Suraj
  • 11
  • 2