Here's a short description of the code I am writing right now:
We are using Apache Hbase API to store data in Hbase Database. The schema is made up of attributes of various data types like date, float, varchar, char, etc...
Now a requirement has come up where we need access to different versions of one tuple ie. the row will be updated at different times over years and we would like to access all these different versions.
Currently Apache API only has support for
1) Defining the number of versions that a table should maintain at the time of creating the table using DDL
2) When creating a new connection, specifying the version number of the table on which all the queries should work
https://community.hortonworks.com/questions/51846/query-versions-in-phoenix.html
But this is too restricted, the HBase API has support for time range and setting max version within that time range which we need. So I decided to access the data stored using Phoenix API using Hbase API.
This is the issue I am facing:
1) I want to filter rows based on any attribute from primary key. My primary key consists of 9 attributes:
Char(10),Char(10),Char(3),Varchar(40),Varchar(8),Varchar(8),Varchar(40),Varchar(256),Date
Phoenix API concatenates these values and creates a row key from them which looks something like this:
$qA$F62&81PPEDOID01 PGKBLOOMBRG\x00VENDOR\x00PRCQUOTE\x00BB\x001\x00\x80\x00\x01aD\x5C\xFC\x00
I am using Hbase Row Filter with Equal To Comparator with Sub String Match to filter rows based on their primary key value...
Filter IDFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("$qA$F62&81"));
Filter CurrencyCodeFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("PGK"));
ArrayList<Filter> filters = new ArrayList<>();
filters.add(IDFilter);
filters.add(CurrencyCodeFilter);
FilterList filterList = new FilterList(Operator.MUST_PASS_ALL ,filters);
scan.setMaxVersions(1);
scan.setFilter(filterList);
This works fine for primary key attributes that are char, varchar and numbers. But I just can't filter out based on the date and it's really necessary.
The problem with date is:
I don't understand the encoding it uses, eg Phoenix API stores the date "2018-01-30" as \x80\x00\x01aD\x5C\xFC\x00
I understand that the Phoenix API places "\x00" after varchar to act as a delimiter, but I don't understand this encoding.
So I tried running this command in Hbase Shell:
hbase(main):007:0> scan 'HSTP2', {FILTER => "RowFilter(=,'substring:\x80\x00\x01aD\x5C\xFC\x00')"}
I got proper results
But when I tried the same in Java using Hbase API, I don't get any results:
Filter DateFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("\\x80\\x00\\x01aD\\x5C\\xFC\\x00"));
And I get this when I sysout the DateFilter
RowFilter (EQUAL, \x5Cx80\x5Cx00\x5Cx01ad\x5Cx5c\x5Cxfc\x5Cx00)
The conversion of '\' > '\x5C' is the cause of the problem due to which I don't get any results.
How can I perform row filters base on any date? Will I have to convert the date to the format that Phoenix API stores it in and then run a row filter? Or is there some other way?
This is my code so far testing filtering based on different attributes and decoding the fetched data:
import java.io.IOException;
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.Locale;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import org.apache.hadoop.hbase.filter.PrefixFilter;
import org.apache.hadoop.hbase.filter.QualifierFilter;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.protobuf.generated.HBaseProtos.CompareType;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import org.apache.hadoop.hbase.filter.BinaryPrefixComparator;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.Filter.ReturnCode;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.FilterList.Operator;
import org.apache.phoenix.schema.PStringColumn;
import org.apache.phoenix.schema.SortOrder;
import org.apache.phoenix.schema.types.PDataType;
import org.apache.phoenix.schema.types.PDate;
import org.apache.phoenix.schema.types.PFloat;
import org.apache.phoenix.schema.types.PInteger;
import org.apache.phoenix.shaded.org.apache.directory.shared.kerberos.codec.types.PaDataType;
import org.apache.phoenix.shaded.org.joni.Regex;
import org.apache.hadoop.hbase.util.Bytes;
public class HbaseVersionedPriceFetcher {
public static void main(String[] args) {
try {
Configuration conf = HBaseConfiguration.create(new Configuration());
conf.set("hbase.zookeeper.quorum", "hostName");//Private Detail
conf.set("hbase.zookeeper.property.clientPort", "2181");
HTable table = new HTable(conf, "HSTP2");
// Filter filter = new SingleColumnValueFilter("0".getBytes(),"LAST_CHG_USR_ID".getBytes(), CompareOp.EQUAL, "AUTO:GEN:SCRIPT".getBytes());
// Filter filter = new SingleColumnValueFilter("ISPH".getBytes(),"MKT_OID".getBytes(), CompareOp.EQUAL, "MARKET".getBytes());
// Filter filter = new SingleColumnValueFilter("ISPH".getBytes(),"VALIDATED_PRC_TYPE".getBytes(), CompareOp.EQUAL, "MID".getBytes());
Scan scan = new Scan();
//Filter List
Filter IDFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("qA$F62&81"));
Filter CurrencyCodeFilter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("PGK"));
ArrayList<Filter> filters = new ArrayList<>();
filters.add(IDFilter);
filters.add(CurrencyCodeFilter);
FilterList filterList = new FilterList(Operator.MUST_PASS_ALL ,filters);
scan.setMaxVersions(1);
scan.setFilter(filterList);
//REGEX
//Filter filter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(".*PGK.*VENDOR.*"))
//scan.addColumn("ISPH".getBytes(), "ADJST_TMS".getBytes());
// long start = new Long("1529578558767");
// long end = new Long("1529580854059");
//
// try {
// scan.setTimeRange(start,end);
// } catch (IOException e) {
// // TODO Auto-generated catch block
// e.printStackTrace();
// }
ResultScanner scanner = table.getScanner(scan);
int count = 0;
for (Result rr : scanner) {
count += 1;
System.out.println("Instrument "+ count);
System.out.println(rr);
for (KeyValue value: rr.raw()) {
String qualifier = new String(value.getQualifier());
System.out.print( qualifier+" : ");
byte[] valByteArray = value.getValue();
if(qualifier.equals("ASK_CPRC") || qualifier.equals("BID_CPRC") || qualifier.equals("MID_CPRC") || qualifier.equals("VALIDATED_CPRC")) {
float decoded = PFloat.INSTANCE.getCodec().decodeFloat(valByteArray, 0, SortOrder.getDefault());
System.out.println(decoded);
} else if (qualifier.equals("LAST_CHG_TMS") || qualifier.equals("ADJST_TMS") ) {
System.out.println(PDate.INSTANCE.toObject(valByteArray, SortOrder.getDefault()));
} else if (qualifier.equals("HST_PRC_DTE_OF_NUM")) {
int decoded = PInteger.INSTANCE.getCodec().decodeInt(valByteArray, 0, SortOrder.getDefault());
System.out.println(decoded);
} else {
System.out.println(new String(valByteArray));
}
}
}
scanner.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
static byte[] getBytes(String string) {
return string.getBytes();
}
}