I have a large number of objects, currently around 1 million, stored in a GCP Cloud Storage Bucket. Objects are added at a rate of 1-2 thousand per day. I would like to efficiently run queries to look up objects in the bucket based on the metadata for those objects, including file name infix/suffix, date created, storage class, and so forth.
The Cloud Storage API allows searching by filename prefix (docs), but the callback takes several seconds to complete. I can do infix queries with gsutil
, like gsutil ls gs://my-bucket/foo-*-bar.txt
, but this is even slower. Additionally, these queries are considered Class A operations, which incur costs.
Rather than dealing with the Cloud Storage API for searching my bucket, I was thinking I could add a listing of all objects in my bucket to a database such as Bigtable or SQL. The database should stay in sync with all changes to the bucket, at least when objects are created or deleted, and ideally when modified, storage class changed, etc.
What is the best way to achieve this?