how to save a torch.tensor or np.array to redis and search vector similarity?

Question

I'm in trouble with saving my data to redis with python code. just using redis and r.ft()

the uploading data is going to be like this. also I want to refresh the embeddings in a different values in same ids.

id is the data index and embeddings are going to be flatten with same shape between all datas. (ex. 1024) id embeddings 0 [3.1515, 4.5562, ..., ] 1 [3, 8.62, ..., ]

after uploading redis, I want to search a certain batch embeddings with redis.

if the input batch shape is [3, 1024] then the search should be iterative to the batch and return [3, top-k] similar ids that has similarity with embeddings in redis.

it is really hard for me to make this right now. waiting for help.

score 3 · Answer 1 · answered Mar 20 '23 at 13:52

A few helpful links first: This notebook has some helpful examples, here are the RediSearch docs for using vector similarity, and lastly, here's an example app where it all comes together.

To store a numpy array as a vector field in Redis, you need to first create a search index with a VectorField in the schema:

import numpy as np
import redis

from redis.commands.search.indexDefinition import (
    IndexDefinition,
    IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
    TextField,
    VectorField
)

# connect
r = redis.Redis(...)

# define vector field
fields = [VectorField("vector",
    "FLAT", {
        "TYPE": "FLOAT32",
        "DIM": 1024,  # 1024 dimensions
        "DISTANCE_METRIC": "COSINE",
        "INITIAL_CAP": 10000, # approx initial count of docs in the index
    }
)]

# create search index
r.ft(INDEX_NAME).create_index(
    fields = fields,
    definition = IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH)
)

After you have an index, you can write data to Redis using hset and a pipeline. Vectors in Redis are stored as byte strings (see tobytes() below):

# random vectors
vectors = np.random.rand(10000, 1024).astype(np.float32)

pipe = r.pipeline(transaction=False)
for id_, vector in enumerate(vectors):
    pipe.hset(key=f"doc:{id_}", mapping={"id": id_, "vector": vector.tobytes()})
    if id_ % 100 == 0:
        pipe.execute() # write batch
pipe.execute() # cleanup

Out of the box, you can use a pipeline call to query Redis multiple times with one API call:

base_query = f'*=>[KNN 5 @vector $vector AS vector_score]'
query = (
    Query(base_query)
    .sort_by("vector_score")
    .paging(0, 5)
    .dialect(2)
)
query_vectors = np.random.rand(3, 1024).astype(np.float32)

# pipeline calls to redis
pipe = r.pipeline(transaction=False)
for vector in query_vectors:
    pipe.ft(INDEX_NAME).search(query, {"vector": query_vector.tobytes()})
res = pipe.execute()

Then you will need to unpack the res object that contains the raw response for all three queries from Redis. Hope this helps.

how to save a torch.tensor or np.array to redis and search vector similarity?

1 Answers1