1

Here's a minimal example:

import weaviate

CLASS = "Superhero"
PROP = "superhero_name"


client = weaviate.Client("http://localhost:8080")

class_obj = {
    "class": CLASS,
    "properties": [
        {
            "name": PROP,
            "dataType": ["string"],
            "moduleConfig": {
                "text2vec-transformers": {
                    "vectorizePropertyName": False,
                }
            },
        }
    ],
    "moduleConfig": {
        "text2vec-transformers": {
            "vectorizeClassName": False
        }
    }
}
client.schema.delete_all()
client.schema.create_class(class_obj)

batman_id = client.data_object.create({PROP: "Batman"}, CLASS)

by_text = (
    client.query.get(CLASS, [PROP])
    .with_additional(["distance", "id"])
    .with_near_text({"concepts": ["Batman"]})
    .do()
)
print(by_text)

batman_vector = client.data_object.get(
    uuid=batman_id, with_vector=True, class_name=CLASS
)["vector"]

by_vector = (
    client.query.get(CLASS, [PROP])
    .with_additional(["distance", "id"])
    .with_near_vector({"vector": batman_vector})
    .do()
)
print(by_vector)

Please note that I specified both "vectorizePropertyName": False and "vectorizeClassName": False

The code above returns:

{'data': {'Get': {'Superhero': [{'_additional': {'distance': 0.08034378, 'id': '05fbd0cb-e79c-4ff2-850d-80c861cd1509'}, 'superhero_name': 'Batman'}]}}}
{'data': {'Get': {'Superhero': [{'_additional': {'distance': 1.1920929e-07, 'id': '05fbd0cb-e79c-4ff2-850d-80c861cd1509'}, 'superhero_name': 'Batman'}]}}}

If I look up the exact vector I get 'distance': 1.1920929e-07, which I guess is actually 0 (for some floating point evil magic), as expected. But if I use near_text to search for the exact property, I get a distance > 0.

This is leading me to believe that, when using near_text, the embedding is somehow different.

My question is:

  • Why does this happen?

With two corollaries:

  • Is 1.1920929e-07 actually 0 or do I need to read something deeper into that?
  • Is there a way to check the embedding created during the near_text search?

1 Answers1

2

here is some information that may help:

Is 1.1920929e-07 actually 0 or do I need to read something deeper into that?

Yes, this value 1.1920929e-07 should be interpreted as 0. I think there are some unfortunate float32/64 conversions going on that need to be rooted out.

Is there a way to check the embedding created during the near_text search?

The embeddings are either imported or generated during object creation, not at search-time. So performing multiple queries on an unchanged object will utilize the same search vector.

We are looking into both of these issues.

  • 1
    Maybe I'm misunderstanding how `near_text` works: doesn't it create an embedding ("search vector") for whatever is in the `concepts` and then looks up similar vectors? – Nicola Blago Jul 25 '22 at 14:41
  • @NicolaBlago Yes, to search in Weaviate, the text2vec module is used to convert the concepts into vectors. You can take a breakpoint in the transformer to observe this process. – silverfox Mar 15 '23 at 13:11