2

Here's a short description of the code I am writing right now:

We are using Apache Hbase API to store data in Hbase Database. The schema is made up of attributes of various data types like date, float, varchar, char, etc...

Now a requirement has come up where we need access to different versions of one tuple ie. the row will be updated at different times over years and we would like to access all these different versions.

Currently Apache API only has support for

1) Defining the number of versions that a table should maintain at the time of creating the table using DDL

2) When creating a new connection, specifying the version number of the table on which all the queries should work

https://community.hortonworks.com/questions/51846/query-versions-in-phoenix.html

But this is too restricted, the HBase API has support for time range and setting max version within that time range which we need. So I decided to access the data stored using Phoenix API using Hbase API.


This is the issue I am facing:

1) I want to filter rows based on any attribute from primary key. My primary key consists of 9 attributes:

Char(10),Char(10),Char(3),Varchar(40),Varchar(8),Varchar(8),Varchar(40),Varchar(256),Date

Phoenix API concatenates these values and creates a row key from them which looks something like this:

$qA$F62&81PPEDOID01 PGKBLOOMBRG\x00VENDOR\x00PRCQUOTE\x00BB\x001\x00\x80\x00\x01aD\x5C\xFC\x00

I am using Hbase Row Filter with Equal To Comparator with Sub String Match to filter rows based on their primary key value...

        Filter IDFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("$qA$F62&81"));
        Filter CurrencyCodeFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("PGK"));

        ArrayList<Filter> filters = new ArrayList<>();
        filters.add(IDFilter);
        filters.add(CurrencyCodeFilter);
        FilterList filterList = new FilterList(Operator.MUST_PASS_ALL ,filters);
        scan.setMaxVersions(1);
        scan.setFilter(filterList);

This works fine for primary key attributes that are char, varchar and numbers. But I just can't filter out based on the date and it's really necessary.

The problem with date is:

I don't understand the encoding it uses, eg Phoenix API stores the date "2018-01-30" as \x80\x00\x01aD\x5C\xFC\x00

I understand that the Phoenix API places "\x00" after varchar to act as a delimiter, but I don't understand this encoding.

So I tried running this command in Hbase Shell:

hbase(main):007:0> scan 'HSTP2', {FILTER => "RowFilter(=,'substring:\x80\x00\x01aD\x5C\xFC\x00')"}

I got proper results

But when I tried the same in Java using Hbase API, I don't get any results:

 Filter DateFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("\\x80\\x00\\x01aD\\x5C\\xFC\\x00"));

And I get this when I sysout the DateFilter

RowFilter (EQUAL, \x5Cx80\x5Cx00\x5Cx01ad\x5Cx5c\x5Cxfc\x5Cx00)

The conversion of '\' > '\x5C' is the cause of the problem due to which I don't get any results.


How can I perform row filters base on any date? Will I have to convert the date to the format that Phoenix API stores it in and then run a row filter? Or is there some other way?

This is my code so far testing filtering based on different attributes and decoding the fetched data:

import java.io.IOException;
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.Locale;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import org.apache.hadoop.hbase.filter.PrefixFilter;
import org.apache.hadoop.hbase.filter.QualifierFilter;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.protobuf.generated.HBaseProtos.CompareType;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import org.apache.hadoop.hbase.filter.BinaryPrefixComparator;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.Filter.ReturnCode;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.FilterList.Operator;
import org.apache.phoenix.schema.PStringColumn;
import org.apache.phoenix.schema.SortOrder;
import org.apache.phoenix.schema.types.PDataType;
import org.apache.phoenix.schema.types.PDate;
import org.apache.phoenix.schema.types.PFloat;
import org.apache.phoenix.schema.types.PInteger;
import org.apache.phoenix.shaded.org.apache.directory.shared.kerberos.codec.types.PaDataType;
import org.apache.phoenix.shaded.org.joni.Regex;
import org.apache.hadoop.hbase.util.Bytes;

public class HbaseVersionedPriceFetcher {

    public static void main(String[] args) {

        try {
            Configuration conf = HBaseConfiguration.create(new Configuration());  
            conf.set("hbase.zookeeper.quorum", "hostName");//Private Detail
            conf.set("hbase.zookeeper.property.clientPort", "2181");

            HTable table = new HTable(conf, "HSTP2");

//          Filter filter = new SingleColumnValueFilter("0".getBytes(),"LAST_CHG_USR_ID".getBytes(), CompareOp.EQUAL, "AUTO:GEN:SCRIPT".getBytes());            
//          Filter filter = new SingleColumnValueFilter("ISPH".getBytes(),"MKT_OID".getBytes(), CompareOp.EQUAL, "MARKET".getBytes());
//          Filter filter = new SingleColumnValueFilter("ISPH".getBytes(),"VALIDATED_PRC_TYPE".getBytes(), CompareOp.EQUAL, "MID".getBytes());


            Scan scan = new Scan();

            //Filter List
            Filter IDFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("qA$F62&81"));         
            Filter CurrencyCodeFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("PGK"));


            ArrayList<Filter> filters = new ArrayList<>();
            filters.add(IDFilter);
            filters.add(CurrencyCodeFilter);
            FilterList filterList = new FilterList(Operator.MUST_PASS_ALL ,filters);
            scan.setMaxVersions(1);
            scan.setFilter(filterList);

            //REGEX
            //Filter filter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(".*PGK.*VENDOR.*"))


            //scan.addColumn("ISPH".getBytes(), "ADJST_TMS".getBytes());

//          long start = new Long("1529578558767");
//              long end  = new Long("1529580854059");
//           
//           try {
//              scan.setTimeRange(start,end);
//          } catch (IOException e) {
//              // TODO Auto-generated catch block
//              e.printStackTrace();
//          }

            ResultScanner scanner = table.getScanner(scan);

            int count = 0;


                for (Result rr : scanner) {

                    count += 1;

                    System.out.println("Instrument "+ count);

                    System.out.println(rr);


                        for (KeyValue value: rr.raw()) {



                            String qualifier = new String(value.getQualifier());
                            System.out.print( qualifier+" : ");

                            byte[] valByteArray = value.getValue();

                            if(qualifier.equals("ASK_CPRC") || qualifier.equals("BID_CPRC") || qualifier.equals("MID_CPRC") || qualifier.equals("VALIDATED_CPRC")) {

                                float decoded = PFloat.INSTANCE.getCodec().decodeFloat(valByteArray, 0, SortOrder.getDefault());
                                System.out.println(decoded);

                            } else if (qualifier.equals("LAST_CHG_TMS") || qualifier.equals("ADJST_TMS") ) {

                                System.out.println(PDate.INSTANCE.toObject(valByteArray, SortOrder.getDefault()));

                            } else if (qualifier.equals("HST_PRC_DTE_OF_NUM")) {

                                int decoded = PInteger.INSTANCE.getCodec().decodeInt(valByteArray, 0, SortOrder.getDefault());
                                System.out.println(decoded);
                            } else {
                                System.out.println(new String(valByteArray));
                            }
                        }

                    }

                scanner.close();

            } catch (IOException e1) {
                e1.printStackTrace();
            } 
    }

    static byte[] getBytes(String string) {

        return string.getBytes();
    }
}
halfer
  • 19,824
  • 17
  • 99
  • 186
Parth
  • 2,682
  • 1
  • 20
  • 39
  • You need to specify how you serialized the data in table? And you have to use same serializer. Phoenix has custom serialization and since you wrote data directly by HBase API I recommend you to use HBase api for reads also. If you follow Phoenix custom serialization at the time of writes using HBase API(its not recommended) then you can probably create a Phoenix view on that table and access data. – Anil Gupta Jun 23 '18 at 14:48
  • The data is stored using Phoenix API and I wan't to read it using Hbase API, creating a Phoenix View won't solve my use case of getting latest version in certain time range! Do you know how I can serialize the date like Phoenix API does? – Parth Jun 25 '18 at 04:08

0 Answers0