I'm trying to perform searches by multiple prefixes at Google Cloud Bigtable with the Python SDK. I'm using read_rows
, and I can't see a good way to search by prefix explicitly.
My first option is RowSet
+ RowRange
. I'm testing three queries, and the times that I'm getting are ~1.5s, ~3.5s and ~4.2s, which are an order of magnitude slower than the searches with the Node SDK (which has a filter option) ~0.19, ~0.13, ~0.46.
The second option is using RowFilterChain
+ RowKeyRegexFilter
. Performance is terrible for two of the queries: ~3.1s, ~70s, ~75s ~0.124s, ~72s, ~69s. It looks like it's doing a full scan. This is the code section:
regex = f'^{prefix}.*'.encode()
filters.append(RowKeyRegexFilter(regex))
My third option is using the alternative Happybase-based SDK, which has prefix filtering. With that, I'm getting ~36s, ~3s, ~1s ~0.4, ~0.1, ~0.17. The first query involves multiple prefixes, and it doesn't seem to have support for multiple filtering in the same request, so I'm performing as many requests as prefixes and then concatenating the iterators. The other two seem to leverage the prefix filter.
UPDATE: I deleted the first times because there was a mistake with the environment. After doing it properly, times are not bad for range query, but it seems to be room for improvement, as Happybase tests are still faster when they leverage prefix search.
Would appreciate help about using multiple prefix searches in Happybase, or actual prefix search in the main Python SDK.