1

We are using phoenix and hbase-indexer for our hbase cluster and we have found a curious phenomenon about phoenix secondary indexes :

We put data (use psql to import csv data) into one table(C_PICRECORD) with two global mutable index tables(C_PICRECORD_IDX1 and C_PICRECORD_IDX2) from phoenix and make hbase-indexer to replicate data into solr. After data import finished, we found solr document numfound is different from hbase table row count. When we drop the index tables, clear data and put data again, we found solr document numfound is the same as hbase table row count.

Recently We found the reason why solr document numfound is different from hbase table row count : 1) it will make something wrong to delete document though in fact we just do import data (psql) with two global index tables from phoenix!

2) the curious phenomenon about inconsistent row count between solr and hbase only occur when we had secondary indexes with the data table.

It trouble us for a long time and it seems that there is something different when using phoenix secondary indexes.

So, my question is : Does Phoenix Secondary Indexes handle WAL log specially ?

our env:

cdh5.4.2 hbase-1.0.0-cdh5.4.2 phonenix-4.6 hbase-solr-1.5-cdh5.4.2(hbase-indexer)

cluster : 3 hbase regionserver and 3 hbase-indexer

[Tips]

hbase-indexer put data into solr based on hbase replication. It will catch the WAL log and put interested data into solr. hbase-indexer github : https://github.com/NGDATA/hbase-indexer

CrazyPig
  • 137
  • 12

1 Answers1

2

we solved the problem recently. Please see another stackoverflow question:

hbase-indexer solr numFound different from hbase table rows size

And about the question of Does Phoenix Secondary Indexes handle WAL log specially?, please see :

http://www.slideshare.net/jesse_yates/phoenix-secondary-indexing-la-hug-sept-9th-2013

for more detail information about phoenix secondary index.

Community
  • 1
  • 1
CrazyPig
  • 137
  • 12