I have 100.000 images, each of them have 500 orb vectors, and each image has a unique tag.
My general issue is, when I insert a new image (i.e. 500 new vectors), how can I know if the image's tag is already in the database ?
What I do is to attache to each vector a metadata "tag". In can retrieve the inserted tags with
result = client.query.get('orb_vector', ['tag'])\
.with_limit(200)\
.do()
This provides more or less 200 tags among the 100.000 existing.
Accordingly to the documentation, that way of doing is not scalable.
How do I do ?
Context:
My database is not very dynamic; apart of the initial big insertion (100.000+ images), there will be few insertions each day. So I'm okay with a request taking 5 minutes and keeping the result in memory in a non-dynamic way. Plain python list is okay.
Clarification: each image has one tag, but 500 vectors. So each tag is present 500 times in the database.
I'm using python.
What I can do:
Writing the list of tags in a json/mongo/other and reading/updating it each time I insert new images. I prefer to avoid this solution since the synchronization between the weaviate database and the json will just be a nightmare.