0

Is there a way to create and incrementally update a BigQuery ANN index like HNSW?

I'm interested in using BigQuery for vector storage, but trying to avoid full-scans. I know BigQuery originally was "full scan, all the time," but they introduced search indexes and I'm wondering if people have gotten creative with them and leveraged ANNs somehow.

  • If you want this feature to be implemented, you can open a new [feature request](https://cloud.google.com/support/docs/issue-trackers) on the issue tracker describing your requirement. – Sakshi Gatyan Apr 13 '23 at 10:53
  • BigQuery is a datalake and lets you performe full table scans very fast and cheap, if it is done right. I assume there is large BigQuery table, this has a partition column `colP` and a column `colA` with an array of numbers, which is your vector. Finding the `colA` with the lowest vector dot product of vector `x` and `colA`. This would be full table scan query of `colA`. However, a query could perform the operation for several given `x` vectors and thus do several searchs at once. Also it could help to obtain the `colP` and then query the needed partition of the table (`EXECUTE IMMEDIATE`) – Samuel Apr 13 '23 at 20:26

0 Answers0